Closed zihanlalala closed 1 year ago
I assume you refer to the code 2022-lsh-attention?
Maybe @Zettelkasten can help.
Hi and thanks for your interest! I added an example config here: https://github.com/rwth-i6/returnn-experiments/blob/master/2022-lsh-attention/complete-returnn-example.config Let me know if you need any help.
@Zettelkasten Thank you so much! Now I have troubles in generating vocab. My data is in plain text form and is BPEed using subword-nmt. I would like to use TranslationDataset to load it. But I don't know how to generate vocab in RETURNN form. Here is an example of my vocab:
{'push@@': 110, 'utmost': 100, 'erated': 100, 'Eleven': 100}
It is processed as dict, where key is the token and value is frequency as in fairseq, and serialized using pickle. But Assert error occured. I wonder what is the correct format of vocab, or what is the recommended way to generate vocab?
The vocab is supposed to be the repr
of a Python dict
, where the keys are the the token (as you have it) and the values are the indices (i.e. values from 0 to num_tokens - 1).
I am interested in lsh-attention and want to re-implement the result, maybe on a different datatset, but It is difficult and time-costing for me, who is not familiar with Returnn, to re-implement. Could you please provide a script with same config in your lsh-attention paper, and I can simply run it by simply replace the dataset ?