Closed danielfleischer closed 1 year ago
Do indexing and search work for you? What index are you using for the codec above?
We don't generally expose the decompression API directly, so I'm trying to see what you are trying to achieve.
Indexing works. Searching works but it's subtle; candidates are collected using the heuristic of using the centroids as proxies. Next, we want to reconstruct the full vectors for ranking + topK. Here I get zero vectors (see issue); the vectors become NaN after normalization, ranking doesn't do anything and I get topK documents which look relevant (heuristics work) but the scores are NaN, which started the whole debugging and led to me to pin point what seems to be the issue.
Are you doing this to debug the standard Searcher, or are you trying to re-implement your own search? That wasn't clear to me.
I want to use the library and am debugging the current code. I got Nan scores when searching and from there I saw the decompression returns zero vectors.
Thanks. I've never seen that before. Are you using the checkpoint we provide? What information can you provide about your collection: how many passages, passage language, lengths, etc.
Information about the hardware will also be helpful.
Also. Did you try the provided example in the Jupyter notebook? Does that work and give you non-NaN scores?
Hi, here is the information you required:
Maybe something is wrong with the index. Are there any sanity checks we can do to make sure the index is fine? thanks!
This collection is heavily tested, so the issue isn't with the dataset.
Have you tried indexing another time?
We have access to a Titan X somewhere. Let me see if someone can test on it.
Hi, we were able to successfully create a searcher
object (like in the demo jupyter code) that returns non-Nan scores on RTX 3090.
Could there be some Cuda dependencies that can only run on newer cards?
We were able to get normal behavior on Titan X. I have to assume your specific cuda setup on Titan X is different in some important way.
Ok, thanks!
Hi, when reconstructing vectors from codes and residuals I always get zero vectors. The relevant code is related to the torch extensions. See minimal example:
this returns zero vectors. The index has content (examine the
pt
files).Did anyone encounter this? is this a bug or an issue with GPU drivers?
Thanks!