neulab / knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
MIT License
269 stars 23 forks source link

automaton: modify database #3

Closed volkerha closed 2 years ago

volkerha commented 2 years ago

Is it possible to modify the database for the automaton during inference time, i.e. add new data on-the-fly? Or do we need to reconstruct the automaton whenever the database changes?

I was wondering about the scenario, where additional information is collected while interacting with the LM.

urialon commented 2 years ago

Hi @volkerha , Thank you for your interest in our work!

Yes, this is definitely possible. You can call the add_with_ids function of the FAISS index: https://github.com/neulab/knn-transformers/blob/master/knnlm.py#L397 at any point, not only when building the initial index.

Later, you can take the existing clusters and add the new entries to existing clusters. This can be performed by just adding new entries to be in the same clusters as their nearest neighbors. Alternatively, you can keep this intermediate index of cluster centroids: https://github.com/neulab/knn-transformers/blob/master/retomaton.py#L208 (we do not keep it after preprocessing, but you can keep and save it if you need it later), and use it to find the cluster of a new example.

Let me know if you have any questions! Uri

neubig commented 2 years ago

Also, if you get this to work and want to send a pull request with results and more detailed directions about how to do so I think this would be a nice feature to add to the library!

volkerha commented 2 years ago

Thanks for the great explanation!

I don't have an immediate plan to follow up on this, but in case I do, I will make sure to contribute.