philpax / paper-organizer

0 stars 0 forks source link

Implement semantic search #1

Closed philpax closed 3 months ago

philpax commented 3 months ago

I was noodling around with this but didn't get it working due to Python dependency resolution conflicts and Pylance shenanigans.

I think this is relatively straightforward; I need to introduce a FAISS index, embed all of the relevant text, associate it with an integer ID (I can set up a bijective mapping between the arXiv ID and an integer ID), and then wire everything up correctly.

philpax commented 3 months ago

See https://github.com/flowpoint/ai_indexer

philpax commented 3 months ago

Thinking about using https://huggingface.co/Snowflake/snowflake-arctic-embed-l but worth noting that querying operates slightly differently to passage embedding and that this needs to be accounted for