voxel51 / fiftyone-docs-search

Search docs.voxel51.com with an LLM!
Apache License 2.0
356 stars 60 forks source link

Tables Preprocessing and Split #13

Closed shulkx closed 8 months ago

shulkx commented 8 months ago

Hi Jacob, Thanks for sharing the valuable experience. I recently worked on a similar project - build chatbot with company knowledge base (mostly in HTML format). In your sharing, you mentioned that you chose to convert doc into markdown finally. About this, I have few questions:

jacobmarks commented 8 months ago

Hi @shulkx ,

Thanks for the note, and the insightful questions. You have hit on something indeed!

This documentation search experiment exclusively used the textual data (in markdown), which absolutely has limitations when working with tables and other complex data. Another example of content I had to strip out entirely was images/gifs!

If you're interested in more advanced retrieval approaches, for now here are some resources I hope may be helpful:

And stay tuned... exciting stuff coming in the not-too-distance future 😉