nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
347 stars 130 forks source link

[BUG]: Poor performance when uploading small batches to Milvus in VDB Upload example #1667

Open mdemoret-nv opened 5 months ago

mdemoret-nv commented 5 months ago

Version

24.03

Which installation method(s) does this occur on?

Docker, Conda, Source

Describe the bug.

When using the Milvus service for writing to a vector database, the performance drops when using small batch sizes or infrequent writes. This is because the service wants to reindex the database after each message, or after a set time has elapsed (it is hard coded to 3 seconds). This is inefficient for a few reasons:

Ideally, we would use something similar to a debounce to update the index. So reindexing only occurs after some set time where no messages have been added.

Minimum reproducible example

milvus_service = MilvusVectorDBService(uri=milvus_server_uri)

# Create the collection
...

# Make a small dataframe with 5 rows
df = cudf.DataFrame({
    "id": list(range(num_input_rows)),
    "age": [random.randint(20, 40) for i in range(num_input_rows)],
    "embedding": [[random.random() for _ in range(3)] for _ in range(num_input_rows)]
})

# Add the rows to the collection in a loop
for _ in range(10000):

    milvus_service.insert_dataframe(collection_name, df)

    # Sleep some amount to allow the data to be inserted (this may need to be tweaked to trigger the bug)
    time.sleep(0.1)

Relevant log output

Click here to see error details

 [Paste the error here, it will be hidden by default]

Full env printout

Click here to see environment details

 [Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct