Closed mdrpanwar closed 2 years ago
The preprocessing contains following steps:
Sure. Do you find anything incorrect in the code?
Thanks a lot!
I am closing the issue for now. I shall reopen if needed.
Hi,
I have a few more questions:
I noticed that in the multi-domains dataset, the dict
of source and target language is the same. Is that a requirement for (Adaptive) kNN-MT? What if we have a base model with different source and target dictionaries?
How do you calculate the datastore size for a dataset? Since the datastore is created from training data, it seems that the number of unique tokens in the training set for the target language should be the datastore size. I just want to know how you compute it from the binarized data.
Your detailed steps above have been of much help. Thanks a lot!
Sorry for the late reply.
It's certainly ok to use a base model with different source and target dictionaries. It depends on how you train the base model. Since recent studies show that a shared dictionary ( or shared BPE ) would be helpful for the translation quality improvements, ( especially on European language translation ) so in De<->En, Fr<->En, etc, we usually use the shared dictionary. But that's not necessary for NMT and (Adaptive) kNN-MT.
The DSTORE_SIZE depends on the num of tokens of target language train data. You can get it in two ways:
Thanks a lot for the details. It's very helpful.
I noticed that the DSTORE_SIZE is some nearest multiple of 10 larger than the actual number of target tokens in preprocess.log
file. Any large number would work, right? (since the purpose is to just have all target tokens in datastore).
For using any new pretrained base model with (adaptive) kNN-MT, we need to register the model in the adaptive-knn-mt/fairseq/models/transformer.py
(here we can change any default values of the fairseq transformer based on base NMT model's architecture like layer size, etc). Now we can use this newly registered model as --arch
argument in the inference scripts. Is there any other step that I missed?
I think a little larger or just as same as the num of target tokens is ok ... cause empty items maybe have an impact on the training and retrieval of Faiss.
Yep, if you just change the hidden size, num layers, or something for standard Transformer arch, you can just register the new arch in thetransformer.py
, like the "transformer_wmt19_de_en" at the end of this file.
Thank you very much!
Thanks a lot for this awesome work, and for releasing the code for the same!
I used your repository and was able to reproduce results for vanilla kNN-MT (K=8) and Adaptive kNN-MT (K=4) on the provided preprocessed data for the IT domain.
I have two queries:
I would like to run your model on other datasets (e.g. WMT'19 as mentioned in section 4.1 of kNN-MT paper). Could you please mention the preprocessing scripts that I could use for the same?
Could you also confirm that the K mentioned in table 2 in your paper is actually the max-k that the Meta-k network was trained on?
Thanks a lot!