marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.57k stars 188 forks source link

[ENHANCEMENT] read pdfs, txt, csv files from pointers #58

Open jn2clark opened 2 years ago

jn2clark commented 2 years ago

Is your feature request related to a problem? Please describe. Yes - it could be good to support reading csv, txt, pdf files from pointers (not scanned though). It would read the text directly (no ocr).

Describe the solution you'd like Have a reader in the same way we do for images. So a pointer a file means it can be read.

Describe alternatives you've considered Alternatives are that the user does this processing before Marqo. This will always be an option but for less complex use cases it would be very convenient.

Additional context Add any other context or screenshots about the feature request here.

bharathgs commented 2 years ago

Hey, this is a great project. we were working on integrating this into our library - ocrpy as a backend for enabling semantic search over pdf and image docs.

tomhamer commented 2 years ago

Sounds great @bharathgs - also feel free to ping me on slack or email me at tom@s2search.Io to discuss further. More than happy to have a chat and see where we might be able to assist!