Open jasonjmcghee opened 10 months ago
Logged an issue over in SQLite.swift repo, but that doesn't mean we can't fork / add support / open a PR there to fulfill the issue! https://github.com/stephencelis/SQLite.swift/issues/1232
Exploring embedding generation from swift… seems like a good candidate would be using candle (rust) with a sentence transformer, and building a binary that takes in text and outputs embeddings.
or explore CoreML and look into transformer or ONNX conversion
I'm really bad at C bindings stuff but i tried to put together a candle text -> embeddings binary that we can talk to via FFI
from rust_embedding_lib README.md
Am I crazy not to use https://github.com/huggingface/swift-transformers?
You might be. :) (edit: although it doesn't seem like there's a ton actually present in that library right now) I was noodling on this and I was prepared to try and embed a Python interpreter into this binary to get access to the whole ecosystem of Python modules there... ; didn't realize Swift was an option there. (Also the idea of embedding a Python interpreter into something seems kind of insane, so I just wanted to try it.)
Do you have an idea of which model embeddings you want to use for search? I've played with a couple of other projects that defaulted to bge-small-en-v1.5 -- #15 or all-mpnet-base-v2 -- #45 from HF leaderboard: https://huggingface.co/spaces/mteb/leaderboard
Both are pretty small, and "seem" good for RAG based on the limited poking I've done with them. I've never tried to use them outside of python though.
edit: n/m, I see gte-small
in the rust project. That's #22
on the leaderboard!
gte-small feels like a good balance between quality and size from manual experimentation, but totally open to suggestion and / or making it so people can use whatever they want
It looks like somebody already posted a coreml conversion of gte-small
: https://huggingface.co/thenlper/gte-small/tree/main/coreml/feature-extraction/float32_model.mlpackage
I have no experience w/ this, so I don't know if that's a format we can use but I found it while researching conversion options.
I also found https://github.com/huggingface/exporters, but they appear to not support embedding models (plus I tried to do the conversion using their tool and it fails a validation step because some math is coming up with NaN
.)
Theoretically, what I built should work, we just need to build the swift framework
I guess that's a question I should have asked initially -- is the FFI bridge + rust lib the way you'd prefer to go? Or something more native like CoreML?
😅 rust embeddings approach means any safetensors model with config and tokenizers should work, which feels like a very good thing. But if you can get CoreML working- that's awesome. I did noticed they were strangely large - like double the size for gte-small
rust embeddings approach means any safetensors model with config and tokenizers should work
Agreed. The "run anything on the internet" was one of the reasons I felt like my awful embed-Python approach could almost be justifiable. I'm agnostic either way re: rust lib vs coreml, just having fun soaking all this stuff up. For my own entertainment I'll probably throw up a branch on my fork illustrating the coreml approach, but I've got no attachment to it. I've just never played w/ CoreML before.
Please! That would be awesome! Thank you- I can't wait.
Not having great luck with prebuilt coreml model. Will post more later on that.
re: rust/candle - I did notice that candle doesn't support metal acceleration yet, only the 'accelerate' framework. I'm not sure if that's a concern with the embedding part, but I could imagine it will be with local LLMs
Not having great luck with prebuilt coreml model. Will post more later on that.
You got this!
candle doesn't support metal acceleration yet
Problem for another day. Don't need the best solution, just need one that works for now.
Hi, @jasonjmcghee I am making RAGchain, which is specialized framework for RAG. I think you are interested in building RAG in local apple silicon environment. But, I think it will be super cool that get data from rem and ingest it through RAGchain, and talks with LLM about my memories. What do you think about this? Do you prefer "no internet connection" for this project?
update (repo here: https://github.com/jasonjmcghee/ragpipe):
This script:
rem
dbollama run openhermes2.5-mistral
with a prompt and the text$ ./askRem "Which GitHub issues have I read recently?" <(sqlite3 db 'select text from allText order by frameId desc limit 1000')
Batches: 100%|███████████████████████████████| 19/19 [00:11<00:00, 1.65it/s]
You have recently read issues: #3 (dark mode icons), #9 (login item - Rem will run on boot), and #11 (icon looks kinda weird when active in dark mode).
total duration: 26.622822625s
load duration: 5.327591125s
prompt eval count: 1933 token(s)
prompt eval duration: 17.73078s
prompt eval rate: 109.02 tokens/s
eval count: 41 token(s)
eval duration: 3.554184s
eval rate: 11.54 tokens/s
@vkehfdl1 - definitely want to make it easy to ingest from rem. You can query the sqlite file right now, which will give you the path to the ffmpeg file + frame offset too, so you can get the text and image.
I'd love to simplify this though / make it easy to just ask rem
somehow / use it as a datasource
@jasonjmcghee Great! I'd love to make data loader from rem
for RAGchain
.
Use rem
as a datasource. I'll let you know my progress.
@jasonjmcghee
I make loader for RAGchain and Langchain. (Compatible with Langchain)
It loads texts from sqlite3 file, and make it to Langchain Document
schema.
You can see PR here.
Now, I'll try to make some kind of demo that using rem
and RAGchain
together.
@vkehfdl1 that looks very cool! Not knowing too much about RAGChain
, how would the data extractor pipeline be run? Would it be beneficial if the extractor is pipeline is triggered by REM at some fixed intervals?
@vkehfdl1 that looks very cool! Not knowing too much about
RAGChain
, how would the data extractor pipeline be run? Would it be beneficial if the extractor is pipeline is triggered by REM at some fixed intervals?
@seletz I just made simple example running RAGchain
and rem
. (repo here: https://github.com/vkehfdl1/rem-RAGchain)
I think it will be super cool that I can trigger ingest pipeline when new rem
record is added. From now, you can run ingest.py
with crontab
. It can run my ingest python script at every x minutes, then new record will automatically ingested, make new embeddings, and use it for talking with LLM!
@jasonjmcghee @seletz
Plus, here is sample image that I run RAGchain
with rem
.
I saw this issue tab with rem
record was turned on 😁
Cool!
However, answer quality is not good enough.
Did you try writing a custom prompt for the use-case?
Would it be beneficial if the extractor is pipeline is triggered by REM at some fixed intervals?
Could be reading into this the wrong way, but I'd want to make sure it's a client-agnostic approach and ideally, rem
isn't facilitating outside applications consuming it's data.
One of my concerns right now though is network access related stuff. Seems like the smart way (from an eng arch perspective) is to have an API for providing access to data and for talking to agents.
but that unlocks "network access" stuff in App Sandbox - which... idk I feel many folks would feel better with a "absolutely no network access" approach.
Maybe there could be 2 builds? One with network access entitlements and one without?
@jasonjmcghee @vkehfdl1 I think a "no network connection" policy is very cool. We could use triggers as mentioned in #14 for this. Maybe it would be OK for now to just call a user-provided script which gets the path to the SQLite DB as argument? The DB tables would be the API, then ...
@jasonjmcghee
Did you try writing a custom prompt for the use-case?
I will try your great prompt! Plus, I will try some experiments for improving answer quality.
First, it will be good we use hybrid retrieval, which means use vector DB and BM25 together. I think it might be common to search specific word for searching. Like human's name?
Second, I want to delete duplicated texts. rem
captures screen often, so it has duplicated texts many times. So it needs to compress information somehow. I plan to try various strategies for this.
Third, use custom prompt.
Fourth, use multi-modal model. Maybe it will take some time to build....
@seletz It will be cool! I agree rem
will be great to keep "no network connection" as default, and user can always access their data easily with hooks or trigger. It looks fastest way to build RAG with rem
from now.
However, in the future, it will be cool that rem
have their own RAG pipeline, totally local, use local embedding and LLM.
@jasonjmcghee
I try your custom prompt at here and the result is actually promising.
There are some examples I tried. (I record rem
issue and repo pages)
Question : Where rem should index all data?
Answer : Rem should index all data in the "allText_content" table in the "main" database.
Question : What is the rem approach for building embedding search and RAG?
Answer : The rem approach for building embedding search and RAG involves indexing all text via an embedding store and using a SQLite extension like sqlite-vss.
But, I tried this like 2 minutes recording only. I'm recording few hours for real use-cases.
Update. Now, ingest document without duplicated ones. I used token f1 score to calculate similarity. And, I use hybrid retrieval and WeightedTimeReranker for latest information. This is my PR here. Try it!
However, raw passage (OCR result) is pretty unprocessed, so LLM can't recognize and extract information easily.
It can be real challenge for high-quality embedding search and QA with rem
.
There is no silver bullet from now. Hope OCR quality will increase or use multi-modal models. Some models that truly understand GUI.
I think this looks super promising:
rem should index all text via embedding store.
We could use something like https://github.com/asg017/sqlite-vss
If we go this route we should fork / open a PR to add the extension https://github.com/stephencelis/SQLite.swift/tree/3d25271a74098d30f3936d84ec1004d6b785d6cd/Sources/SQLite/Extensions
This way we can search without needing verbatim matches.
We'll need to see what the RAM footprint and insertion time is.
More out of the box solutions appear to be available now:
https://github.com/ashvardanian/SwiftSemanticSearch
We'd need to see how long insertion / index updates take, but seems super promising.