manisnesan / fastchai

Repository capturing deep learning & nlp experiments using fastai & pytorch
Apache License 2.0
2 stars 0 forks source link

RAGatouille exploration #63

Open manisnesan opened 9 months ago

manisnesan commented 9 months ago

https://github.com/bclavie/RAGatouille

Announcement tweet by bclavie

https://news.ycombinator.com/item?id=38869223

See Colbert issue https://github.com/manisnesan/AISC-WG-Search-Recsys/issues/23

manisnesan commented 9 months ago

Both langchain and llama integration available

manisnesan commented 8 months ago

Short Guide on Colbert V2 https://x.com/anmolsj/status/1744499524113158207?s=46&t=aOEVGBVv9ICQLUYL4fQHlQ

Ideas

manisnesan commented 8 months ago

image

image

image

manisnesan commented 8 months ago

See the exploration here https://github.com/manisnesan/fastchai/tree/master/ragatouille

manisnesan commented 8 months ago

retrieval model

manisnesan commented 8 months ago

Doc exploration

late interaction retrievers in zero shot task (compared apples to apples) they're very easy to adapt to new domains due to their bag-of-embeddings approach.

constraints

end outcomes

Next Steps

manisnesan commented 8 months ago
retrieval pros cons
bm25/keyword based sparse retrieval fast, consistent performance, intuitive & well understood, no training required exact match req, no semantic info & hits hard perf ceiling
cross-encoder very strong perf, leverages semantic info to large extent especially negation understanding* major scalability issues: retrieve scores by query-doc comparison (commonly used in reranking setting
dense retrieval/embeddings fast, decent performance overall, pre-trained, leverage semantic information though semantic but lacks constrastive info ie no negation understanding, finnicky fine tuning, requires billion params(eg: e5-mistral), billion pre-train samples for top perf, poor generalisation

negation understanding* - I love apples vs I hate apples

Source: https://ben.clavie.eu/ragatouille/#longer-might-read

manisnesan commented 6 months ago

https://gist.github.com/JoshuaPurtell/c1182551fa609736d47df4af82f7c5ab

manisnesan commented 6 months ago

Goal: Using RAGatouille without building a index on index and keeping it in memory in scenarios of small dataset, rapid prototyping.

Created a reproducer for the issue 66 in RAGatouille here. Potential Future Improvement - Example Notebooks could be validated as part of CI.

manisnesan commented 6 months ago

Contextual.ai work on RAG 2.0

Related

https://contextual.ai/training-with-grit/

manisnesan commented 6 months ago

Youtube - Supercharge RAG with late interactions