manisnesan / fastchai

Repository capturing deep learning & nlp experiments using fastai & pytorch
Apache License 2.0
2 stars 0 forks source link

RAGatouille exploration #63

Open manisnesan opened 11 months ago

manisnesan commented 11 months ago

https://github.com/bclavie/RAGatouille

Announcement tweet by bclavie

https://news.ycombinator.com/item?id=38869223

See Colbert issue https://github.com/manisnesan/AISC-WG-Search-Recsys/issues/23

manisnesan commented 10 months ago

Both langchain and llama integration available

manisnesan commented 10 months ago

Short Guide on Colbert V2 https://x.com/anmolsj/status/1744499524113158207?s=46&t=aOEVGBVv9ICQLUYL4fQHlQ

Ideas

manisnesan commented 10 months ago

image

image

image

manisnesan commented 10 months ago

See the exploration here https://github.com/manisnesan/fastchai/tree/master/ragatouille

manisnesan commented 10 months ago

retrieval model

manisnesan commented 10 months ago

Doc exploration

late interaction retrievers in zero shot task (compared apples to apples) they're very easy to adapt to new domains due to their bag-of-embeddings approach.

constraints

end outcomes

Next Steps

manisnesan commented 10 months ago
retrieval pros cons
bm25/keyword based sparse retrieval fast, consistent performance, intuitive & well understood, no training required exact match req, no semantic info & hits hard perf ceiling
cross-encoder very strong perf, leverages semantic info to large extent especially negation understanding* major scalability issues: retrieve scores by query-doc comparison (commonly used in reranking setting
dense retrieval/embeddings fast, decent performance overall, pre-trained, leverage semantic information though semantic but lacks constrastive info ie no negation understanding, finnicky fine tuning, requires billion params(eg: e5-mistral), billion pre-train samples for top perf, poor generalisation

negation understanding* - I love apples vs I hate apples

Source: https://ben.clavie.eu/ragatouille/#longer-might-read

manisnesan commented 8 months ago

https://gist.github.com/JoshuaPurtell/c1182551fa609736d47df4af82f7c5ab

manisnesan commented 8 months ago

Goal: Using RAGatouille without building a index on index and keeping it in memory in scenarios of small dataset, rapid prototyping.

Created a reproducer for the issue 66 in RAGatouille here. Potential Future Improvement - Example Notebooks could be validated as part of CI.

manisnesan commented 8 months ago

Contextual.ai work on RAG 2.0

Related

https://contextual.ai/training-with-grit/

manisnesan commented 8 months ago

Youtube - Supercharge RAG with late interactions