neulab / knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
MIT License
269 stars 22 forks source link

Could KNNSaver support Multi-GPU strategies like DDP? #5

Closed xszheng2020 closed 2 years ago

xszheng2020 commented 2 years ago

Hi, @urialon I am trying to evaluate a model using the DDP strategy but met an error because such strategies will try to write the datastore asynchronously.

Using only one GPU then everything works really well but it is kinda slow.

Any idea? Thanks!

urialon commented 2 years ago

Hi @xszheng2020 , Thank you for your interest in our work!

Can you please elaborate on what are you trying to do? Are you trying to write the datastore distributed-ly or read id distributedly?

Uri t

xszheng2020 commented 2 years ago

Hi, @urialon

Sorry for the ambiguity. I want to write a datastore distributed-ly when evaluating a Language Model on the training corpus.

Thanks!

urialon commented 2 years ago

Hi @xszheng2020 , It's currently not implemented, but I think that it should be possible: if each distributed process gets a distinct part of the training set, performs a forward pass on that chunk, and writes its own datastore - eventually you only need to concatenate all mini-datastores.

Are you asking about it because you are dealing with a huge datastore?

xszheng2020 commented 2 years ago

Hi, @urialon Yes, I am dealing with a huge datastore. I think your idea that splitting the training set into distinct parts should work. I would have a try. Thanks.

urialon commented 2 years ago

If the datastore is huge and there is not enough disk space, you might be able to avoid concatenating all the keys. When you build the FAISS index, you can iterate on all mini-datastores and insert their keys into the FAISS index, and finally delete all mini-datastore-keys, without ever needing to concatenate them. You will only need to concatenate the values, because they are used at test time, but they are much more lightweight than the keys.

Good luck! Let me know if you have any questions.