Closed Giuseppe-Della-Corte closed 4 years ago
Digging up in the parameters list I found out this is not actually an issue.
The --audio-input
parameter has to be used when performing speech translation and this will prevent FBK-Fairseq-ST to attempts to load the source language dictionary.
I have tried to train a model with the following parameters:
It results in
FileNotFoundError [Errno 2] No such file or directory: 'path/to/binarized/data/dict.npz.txt
The output of the binarization process however does not include the source language dictionary:
Files in
path/to/binarized/data/
It seems correct, as I have understood by reading Mattia Di Gangi's article on Medium: "we have a dictionary for the target language (dict.it.txt), and for each split of the data, an index and a content file for the source side (.h5.idx and .h5.bin) and the same for the target side (.it.idx and .it.bin)".
Then why does the script
FBK-Fairseq-ST/fairseq/data/dictionary.py
attempts to opendict.npz.txt
(source language dict) ?The problem arises also when using the MUstC English-Italian dataset (h5 instead of npz):
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/binarized/data/dict.h5.txt'