nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
333 stars 124 forks source link

[BUG]: vdb_upload example pipeline error on inserting large strings #1650

Closed dagardner-nv closed 4 months ago

dagardner-nv commented 4 months ago

Version

24.03

Which installation method(s) does this occur on?

Source

Describe the bug.

Occurs intermittently, presumably based on the content fetched via the RSS feeds.

Minimum reproducible example

python examples/llm/main.py vdb_upload pipeline --stop_after=1024

Relevant log output

Click here to see error details

Unable to insert into collection: VDBUploadExample due to 
RPC error: [batch_insert], , 

Full env printout

Click here to see environment details

 [Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

dagardner-nv commented 4 months ago

The issue is that Milvus has a max string length of 65535 bytes. Characters such as ñ will consume two bytes.

dagardner-nv commented 4 months ago

This appears to be in part a bug on the milvus side as well. If I create a string containing multi-byte characters that is two characters longer than the max of 65535 chars, I receive this exception reflecting the char-length of 65537:

RPC error: [batch_insert], <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 65537, max length: 65535)>, <Time:{'RPC start': '2024-04-23 08:46:27.520470', 'RPC error': '2024-04-23 08:46:27.520651'}>

If I then truncate the data and retry, then I receive a new exception this time reflecting the byte length or 196605:

df['content'] = df['content'].str.slice(0, MAX_STRING_LENGTH)
RPC error: [batch_insert], <MilvusException: (code=1100, message=the length (196605) of 0th string exceeds max length (65535): invalid parameter[expected=valid length string][actual=string length exceeds max length])>, <Time:{'RPC start': '2024-04-23 08:48:08.730043', 'RPC error': '2024-04-23 08:48:08.733671'}>