leighawilliamson commented 1 month ago

This is python client code running in a Jupyter notebook connected to a Db2 on Cloud server in the customer's IBM Cloud account. The python code used to work fine until July 1, 2024. The code pulls log data from Watson Assistant service and stores the data into a Db2 table. It uses an execute_many command to insert the data. Since July 1, 2024, the execute_many client command is throwing an error that has no message text so I have no way to know what's wrong!

Operating System Name: Runtime 23.1 (Watson Studio Jupyter notebook)
Target Db2 Server Version: Db2 on Cloud (whatever that SaaS Db2 service is running)
Python Version: Python 3.10
Test script to reproduce the problem. To reproduce the problem, I have to rerun the Jupyter notebook. I can make that available if needed.

If the exception thrown by the client library included any message text at all, it would be something I could troubleshoot. But throwing an exception with zero message text is really useless. Here is the code from that section of the program and the output from the execute_many command:

tuple_of_tuples = tuple([tuple(x) for x in insertDF.values]) columns = 'CONVERSATION_ID,DIALOG_TURN_NUMBER,CONVERSATION_LENGTH,ENTITIES,INPUT_TEXT,INTENT,INTENT_CONFIDENCE,DIALOG_TITLE,OUTPUT_TEXT,REQUEST_TIMESTAMP,RESPONSE_TIMESTAMP,LANGUAGE_SELECTION,MONTH,DAY,YEAR,USER_ID,INPUT_BIGRAMS,IS_FINAL_TURN,CALL_TRANSFER_COMPLETED,ESCALATION_REASON,ESCALATION_DETAIL,CALLER_INTENTS_COUNT,SMS_MESSAGES_SENT_COUNT,CALLER_INTENTS_LIST,SMS_TOPIC_LIST,USER_PHONE_NUMBER'

insertSQL = 'Insert into WATSON_PHONE_DIALOG.WA_FULL_LOGS(' + columns + ') values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)' insert_stmt = ibm_db.prepare(conn, insertSQL)

%%time

ibm_db.execute_many(insert_stmt, tuple_of_tuples)

Output:

Exception Traceback (most recent call last) Cell In[48], line 2 1 #%%time ----> 2 ibm_db.execute_many(insert_stmt, tuple_of_tuples)

Exception:

As you can see, no message with the Exception ! How does anyone debug this?

Earammak commented 1 month ago

@leighawilliamson, I have tried this on jupyer notebook with small sample program containing few columns and run the program successfully. Can you please share the sample program here to reproduce the problem.

Thanks

imavo commented 1 month ago

@leighawilliamson To answer your question, you can use db2trc to debug it, you would need to start the trace in a shell session, and in that same session start jupyter-notebook, recreate the problem, stop the trace, and dump the trace to a binary file, then convert the binary trace to text form for study or attachment here. . But if that is too difficult, you could also share some further problem determination details here (1) what changed around start of July? (2) what is the output of import ibm_db followed by print(ibm_db.__version__) (3) Does the exception happen 100% of time for every dataframe, or does it depend on the number of rows in the dataframe, is there some pattern?

Earammak commented 1 week ago

@leighawilliamson Can you please share the update for this issue. If you are still facing issue, please share the above-mentioned info.

leighawilliamson commented 1 week ago

Thanks to all who attempted to help me with this problem. I am sure that there is still code in the python client library that throws an Exception that has no text message in some path of the execute_many API. However, the steps to troubleshoot the situation with db2trc, described above, are not practical in the customer's Watson Studio project so I rewrote the code to avoid using the execute_many API and instead use the much less efficient (but functional) execute API for each individual row in the DataFrame. This works consistently ever since I made that change. To answer the additional questions above: (1) nothing changed in the python code whatsoever. It just started throwing this exception with no text message around the beginning of July. I would pose the question "what changed in the db2 python library around July first?" (2) the output from the print(ibm_db.version) executed after the import command is "3.1.4" (3) the throwing of the exception with no text message was consistent for all runs of the notebook regardless of the content of the DataFrame once it started happening. I should note that, because of the volume of daily traffic being collected in the dataframe, all of the dataframes attempted were fairly large-ish (tens of thousands of rows). I did not try to recreate with only ten or twenty rows. I would recommend that somebody inspect the execute_many code to see if there is a path that results in an exception being thrown that doesn't have its exception message field populated with text. I'm not going to put back the execute_many code in the customer's notebook, so you can close this issue if it isn't deemed important enough to pursue.

imavo commented 1 week ago

@leighawilliamson Would like to help you with this, so that execute_many can be used as intended, since it is important for performance. While code-review does suggest some potential suspects, re-creation of the symptom will be conclusive and necessary for assessing the impact of any code-change. To help to recreate, if you would tell us (1) the full output of pip list when the symptom happens, and (2) also tell us the structure of the dataframe (in terms of the python-datatype and length of each column of the dataframe, and most importantly the Db2-datatype and Db2-length of the matching columns in the target Db2-table (i.e. the DDL for WATSON_PHONE_DIALOG.WA_FULL_LOG ).

imavo commented 1 week ago

@leighawilliamson In addition, another suggestion for you. If you have a dataframe that triggers the No-Text Exception, why not save the dataframe to a delimited file, and attempt to recreate in your own environment (i.e. not in watson studio jupyter notebook) by initializing a dataframe with that delimited-file and trying the execute_many() in your own environment to see if the symptom recreates. (All this assumes the data is not privileged/protected etc). In addition, there may be additional diagnostic information available on the watson-studio-environment (for example, there might be some limits on memory-allocation per process, that if breached might leave a diagnostic message in some log file there on that environment. But that is only useful if you have people who might be able to find and examine such diagnostics for you. Anyway, just a thought. The best way forward in my opinion is to recreate.

imavo commented 4 days ago

@leighawilliamson Suggested alternative workaround, pending recreation elsewhere.

A key factor at play here may be the amount of RAM that is allocated for your Watson Studio Jupyter notebook session, and any environment-imposed soft/hard limits that may be in play on that environment, whether documented or not.

When you use ibm_db.execute_many() , all of the data (supplied to ibm_db.execute_many() )is queued at the client side (i.e. on your Watson studio environment) until all tuples in the tuple-of-tuples are consumed. It is clidriver that performs this queueing ( at the request of ibm_db.execute_many()) . It is possible that when the client rqrioblk (default 32kb in size) is filled that the clidriver will send data to the Db2-server anyway and restart the chain. I don't know if clidriver lets you increase rqrioblk to its max of 64kb however.

This means that if you have tens of thousands of tuples and each tuple is many bytes, then the memory allocation can be quite high (relative to available resources/limits) .

Do you know how much RAM is available in your environment? Do you know what per process hard/soft limits are in play as regards memory allocation per process?

An alternative workaround may be to disable autocommit (if possible) and instead of running ibm_db.execute_many() for the entire dataframe contents in a single go, instead run ibm_db.execute() for batches , for example of 10,000 rows at a time. So if you had 40,000 rows in total, that would involve 4 trips to the Db2-server. The assumption is that the memory used by queueing would get released after each call to ibm_db.execute_many() completed, and other batch-sizes might be necessary to fit within limits. Might be worthwhile testing that approach if performance is your big concern, to validate the assumption/test the hypothesis.

This would still be significantly faster than one trip to Db2-server per tuple which seems to be your current workaround.

Obviously, it goes without saying that regardless of whatever limits may be in play, you don't want to get an exception with blank text. That remains to be reproduced, identified and fixed.

It would help if you could respond to earlier requests for clarification.

imavo commented 2 days ago

@leighawilliamson

With no ram constraints, I can use ibm_db.execute_many() successfully with 400000 rows without exceptions.

I used a dataframe with ndim 400000 rows X 16 columns. The target table columns are a mix of char,varchar,integer, bigint,decimal, date, and the average row size is 437 bytes in Db2. I've also successfully tested with 1.1million rows.

This is on linux x64, with python 3.9 and pandas 2.0.2.

If I deliberately constrain the python process to restrict the amount of virtual-memory that it can allocate, then I get an exception MemoryError (which is the expected result, I guess), with no further information given by default.

Your original post in this issue does not show any details of the exception e.g the exception name. Can you give details?

Without further meaningful input from yourself, it's unclear how to progress this if it is not easily reproducible outside of a Watson Studio jupyter-notebook environment.

ibmdb / python-ibmdb

execute_many call throws exception without any message text #956

%%time

Output: