run-llama / LlamaIndexTS

Data framework for your LLM applications. Focus on server side solution
https://ts.llamaindex.ai
MIT License
1.91k stars 354 forks source link

Output one-dimensional lists for CSVReader #902

Open SpeedoPasanen opened 5 months ago

SpeedoPasanen commented 5 months ago

Use case: Read PDF, DOC, CSV etc from a buffer or string without a fs.

I think there's also a need for a new CSVReader that can output one-dimensional lists like this:

A1: A2 B1: B2

A1: A3 B1: B3 Etc... And still as one row per doc or joined to one doc.

I think modifying PapaCSVReader for this is not possible, because its constructor has too many boolean arguments, and adding one more would make it even more confusing. Would need to change it to a single config object with clear overloads, which would be a breaking change.

Off topic but something to think about: it's confusing that the PDF reader is assigning id_ to the docs it produces but other readers are not. I think either all readers should do it (if configured in some clear manner to do do so) or none should do it. Hidden functionality like that is potentially dangerous.

marcusschiesser commented 5 months ago

Thanks for your good feedback @SpeedoPasanen.

Agree, I just unified ID and metadata for readers and added to read the content of a file by using Buffer, see https://github.com/run-llama/LlamaIndexTS/commit/73819bf19d63d7a7169afc7a1628535bbdd10fd9 Luckily this also is a non-breaking change.

About your other request: Agree, PapaCSVReader has too many parameters, a single config object would help. I think it's an acceptable breaking change. You're welcome to send a PR and add your feature.