First of all thanks a lot for this series of lessons!
Probably it is known fact but for me it was not clearly for the first time when I found it, that if we run the cell this code from your Jupyter notebook for Lessons 1-4 multiple (for example, k) times:
Then there will be k duplicated records for each original record, because this method added documents even if collection already exists.
We can check it using this code for example:
Hi @rlancemartin,
First of all thanks a lot for this series of lessons!
Probably it is known fact but for me it was not clearly for the first time when I found it, that if we run the cell this code from your Jupyter notebook for Lessons 1-4 multiple (for example, k) times:
Then there will be k duplicated records for each original record, because this method added documents even if collection already exists. We can check it using this code for example:
As I remember, I saw similar behavior for langchain wrapper of Weaviate database.
So as a quick workaround we can remove default collection (which has name "langchain") before we add documents:
Since there are no warnings or errors about existing collection, this feature may not be immediately noticed, so I hope it will be useful to someone.
P.S. I also noticed that during Part 4 here we can see that 4 documents are retrieved where 2 of them are duplicates of another ones.
Thank you.