Closed volkerha closed 2 years ago
Hi @volkerha , Thank you for your interest in our work!
Yes, this is definitely possible.
You can call the add_with_ids
function of the FAISS index
:
https://github.com/neulab/knn-transformers/blob/master/knnlm.py#L397
at any point, not only when building the initial index.
Later, you can take the existing clusters and add the new entries to existing clusters. This can be performed by just adding new entries to be in the same clusters as their nearest neighbors. Alternatively, you can keep this intermediate index of cluster centroids: https://github.com/neulab/knn-transformers/blob/master/retomaton.py#L208 (we do not keep it after preprocessing, but you can keep and save it if you need it later), and use it to find the cluster of a new example.
Let me know if you have any questions! Uri
Also, if you get this to work and want to send a pull request with results and more detailed directions about how to do so I think this would be a nice feature to add to the library!
Thanks for the great explanation!
I don't have an immediate plan to follow up on this, but in case I do, I will make sure to contribute.
Is it possible to modify the database for the automaton during inference time, i.e. add new data on-the-fly? Or do we need to reconstruct the automaton whenever the database changes?
I was wondering about the scenario, where additional information is collected while interacting with the LM.