mem0ai / mem0

The Memory layer for your AI apps
https://mem0.ai
Apache License 2.0
22.54k stars 2.08k forks source link

Improve logs which shows that same text/content exists and its chunking and embedding generation is skipped. #1070

Open taranjeet opened 10 months ago

taranjeet commented 10 months ago

🐛 Describe the bug

I am trying to add the same text to the app. But the logs are not clear. I checked the number of chunks, its one only. But the logs are not clear and it seems like the process is happening again.

Here is the entire code to reproduce this

In [2]: from embedchain import Pipeline as App

In [3]: app = App()
a
In [4]: app.add("Hello world")
2023-12-28 20:18:58,223 - root - ERROR - Insert valid string format of JSON.             Check the docs to see the supported formats - `https://docs.embedchain.ai/data-sources/json`
Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.62it/s]
Successfully saved Hello world (DataType.TEXT). New chunks count: 1
Out[4]: '3e25960a79dbc69b674cd4ec67a72c62'

In [5]: app.add("Hello world")
2023-12-28 20:19:05,516 - root - ERROR - Insert valid string format of JSON.             Check the docs to see the supported formats - `https://docs.embedchain.ai/data-sources/json`
Inserting batches in chromadb:   0%|                                                                                                                        | 0/1 [00:00<?, ?it/s]2023-12-28 20:19:06,016 - chromadb.segment.impl.vector.local_persistent_hnsw - WARNING - Add of existing embedding ID: bb8c7577-2af3-4fd6-8221-e607cf96396e--34193c37dfc2c2e7a8d7c3391bf59eb497ae5ec7e0362a3865dec9ae2af63c05
2023-12-28 20:19:06,017 - chromadb.segment.impl.metadata.sqlite - WARNING - Insert of existing embedding ID: bb8c7577-2af3-4fd6-8221-e607cf96396e--34193c37dfc2c2e7a8d7c3391bf59eb497ae5ec7e0362a3865dec9ae2af63c05
Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.99it/s]
Successfully saved Hello world (DataType.TEXT). New chunks count: 0
Out[5]: '3e25960a79dbc69b674cd4ec67a72c62'

In [6]: app.db.count()
Out[6]: 1
Esparon1 commented 6 months ago

Hi @taranjeet can I try working on this ?