microsoft / rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
https://github.com/microsoft/rag-experiment-accelerator
Other
178 stars 60 forks source link

textLoader log messages are confusing #128

Open joshuaphelpsms opened 10 months ago

joshuaphelpsms commented 10 months ago

The textLoader uses the load_structured_files function to spilt the documents but provides markdown as the language. Based off the resolved discussion at this PR https://github.com/microsoft/rag-experiment-accelerator/pull/88, this is fine.

However, The log messages are confusing because it says that it is loading markdown files instead of text. We should discuss how we would like to fix this.

However, the log messages are confusing because it says that it is loading markdown files instead of text. We should discuss how we would like to fix this.

I see two options but open to more thoughts:

  1. The language should be changed to text and follow the pattern that the jsonLoader.py uses, skipping the from_laguage call.
  2. Continue to use markdown as the argument to from_language and just adjust the log messages.

It doesn't seem right to be using markdown as the language on a txt document but let's discuss the best path forward.

rasavant-ms commented 9 months ago

@ritesh-modi - Any thoughts here?