yukiisbored / patchouli-x

Search your darlings ✨
ISC License
0 stars 0 forks source link

Add embeddings / Use hybrid search #11

Open yukiisbored opened 2 days ago

yukiisbored commented 2 days ago

We're currently only using the full-text search capabilities of Orama.

However, from experience building the original Patchouli, embeddings improves the ranking by a lot as it is able to figure out things which are similar to the query.

Since we want Patchouli to work offline, we should be using open-source embedding/sentence similarity models and run them locally.

Luckily, thanks to the ONNX Runtime and transformers.js, we can run and use these models easily.

The only open option is which model to use, here are the candidates which I've gathered so far:

There are multilingual models but I'm unsure about their quality. At the same time, I don't watch Patchouli to only work with English but unfortunately, that seems to be the case for now because Orama stemming only supports a single language out of the box.

yukiisbored commented 2 days ago

Of course, it goes without saying the choice of the model used isn't the end all or be all. Open-source language models evolves a lot so the architecture of whatever we're building should allow replacing models easily and "upgrading" existing metas.