mongodben / mongodb-oracle

The MongoDB Oracle 🧙‍♀️🔮🌱
https://mongodb-oracle.vercel.app
8 stars 3 forks source link

improve the parsing when indexing #10

Closed mongodben closed 1 year ago

mongodben commented 1 year ago

nothing specific in mind here, but i imagine the parsing could be improved to improve answer quality.

couple of ideas:

  1. right now it's parsing text to some 'pseudo-markdown'. it works pretty well. however i think if the html were parsed into real markdown, the results would be better. especially the code example.
  2. explore using some langchain utils for parsing. they have some more advanced stuff https://langchain.readthedocs.io/en/latest/modules/indexes/examples/textsplitter.html?highlight=markdown
mongodben commented 1 year ago

blog posts i mentioned in meeting: