llm-tools / embedJs

A NodeJS RAG framework to easily work with LLMs and embeddings
https://llm-tools.mintlify.app/get-started/introduction
Apache License 2.0
334 stars 40 forks source link

Add Excel (maybe aother attachments types) creates duplicate Loader ID's #63

Closed converseKarl closed 6 months ago

converseKarl commented 6 months ago
  1. Add Excel file to loader
  2. Query loaders
  3. Add same Excel file to loader (repeat process)
  4. Query loaders

In v0.77, two entires Duplicate ID's

I would expect it would at the very least to create a new different Loader ID or if its able to determine the spreadsheet is same as before (filename matched) it will not add it and return a "could not add, as named resource already exists"

adhityan commented 6 months ago

When you say query loader, what do you mean?

converseKarl commented 6 months ago

when you use query the loaders this way, ragApplication.loaders you get a json list of the types loaded, i use this to built my front end list, which is when i noticed uploading the same excel twice created two entires when querying the ragApplication object and using loaders method to get the latest list. Hope that helps

adhityan commented 6 months ago

There was never an intent (originally) to maintain a single instance of a unique loader as the unique identifier was only used to delete the vector values from the database if the loader was previously seen.

I reflected on this issue and decided to not maintain duplicates (by removing older versions of loaded loaders or newer versions of loaders being loaded in parallel). Have published a new version 0.0.79 with these changes.

adhityan commented 6 months ago

Now ragApplication.loaders should always have one instance of a loader of the same data.

converseKarl commented 6 months ago

I can confirm on adding excel named file, it indexes,and lists from the get loaders method. Adding it again, only one entry appears in the list from loaders

I also observed this with the web loaders too so appears to be fixed

Job well done!