Open contrebande-labs opened 2 months ago
No, there isn't, but that's pretty negligible compared to the cost of everything else.
I'm exploring how to make it faster without giving up accuracy in the latest commits. So far it looks like in the two stages (n_ann_docs and n_colbert_candidates) it is MUCH faster to have a high value for the former and a smaller value for the latter. E.g. for top 10 results the sweet spot is n_ann_docs=200, n_colbert_candidates=20.
Hi!
I'm trying the undocumented Zstd compression for the Cassandra Python client and it didn't complain so I guess it works. It's how I could sleep at night in the past few days wrt to bandwidth costs. I will look at your new code later today and also consider moving the retriever within the same data center as the cluster to minimize latency and cost as much as possible. I'm also working on a Java version with ONNXRuntime. Thank you for publishing this code!
Very cool, please link your Java version when it's ready!
I will ask the Python driver team if we can add named bind variables to solve this.
Yes, I will post something on huggingface face for the benchmark data, the ONNX models and the Python prototype (and cite this code). And if it gets to that, I will somehow publish something about the production Java code that I'm optimizing for the Graviton4 platform (but that will work with any platform supported by the JVM >=22 and ONNXRuntime).
Also, thanks for asking the Python driver team! Named or indexed values (like \1, \2 and so on) would be a great addition for this use case.
I'm keeping you posted via this Issue!
On Thu, Sep 12, 2024 at 9:32 AM Jonathan Ellis @.***> wrote:
Very cool, please link your Java version when it's ready!
I will ask the Python driver team if we can add named bind variables to solve this.
— Reply to this email directly, view it on GitHub https://github.com/jbellis/colbert-astra/issues/1#issuecomment-2346296319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOQHMJTS4IHFITGK4ZVV73TZWGJWRAVCNFSM6AAAAABOCJGFVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGI4TMMZRHE . You are receiving this because you authored the thread.Message ID: @.***>
good news: there is a partial solution. this works:
query_ada_cql = f"""
SELECT id, title, body, similarity_dot_product(ada002_embedding, :v) as similarity
FROM {keyspace}.chunks
ORDER BY ada002_embedding ANN OF :v
LIMIT 10
"""
self.query_ada_stmt = self.session.prepare(query_ada_cql)
...
rows = db.session.execute(db.query_ada_stmt, {'v': qv})
bad news: when I tried making LIMIT a named bind var as well it broke, i think this is a bug.
good news: you can work around that fairly easily by just doing string substitution for the LIMIT parameter if necessary.
The first ANN query looks like this for me:
SELECT page, similarity_dot_product(embedding, ?) as similarity
FROM {db_keyspace}.{db_table_name}
WHERE collection = ? AND document = ?
ORDER BY embedding ANN OF ?
LIMIT ?
Since I have other parameters besides the LIMIT, I will wait until the bug is fixed in the Python driver.
Also, I guess, because these parameters are in the WHERE clause, this query slower for me. I'm regularly getting coordinator timeouts, usually after just 20 query token iterations or so. I will send an email to my account rep and CC you, because I don't think this is code related.
Hi,
I'm currently evaluating AstraDB with colbert-type vector search. I'm using Colpali as the model and took a lot of inspiration from your code. I'm facing many performance problems with Colbert on AstraDB, mainly because the colbert strategy generates a lot of requests to the vector store and a lot more storage than simple dense embeddings. Anyways, one of them is about the way the vector value has to be copied twice in the request if one also wants the similarity value on top of the ANN ORDER BY clause. Is there a way to refer to the same value twice in the prepared statement and avoid sending it twice over the wire?
Thanks!
Vincent