mattiadg / FBK-Fairseq-ST

An adaptation of Fairseq to (End-to-end) speech translation.
Other
22 stars 13 forks source link

train.py, FileNotFoundError, asking for dict.source.txt #6

Closed Giuseppe-Della-Corte closed 4 years ago

Giuseppe-Della-Corte commented 4 years ago

I have tried to train a model with the following parameters:

python FBK-Fairseq-ST/train.py path/to/binarized/data \
    --clip-norm 5 --max-sentences 32 --max-tokens 100000 --save-dir model/ --max-epoch 150 \
    --lr 0.001 --lr-shrink 1.0 --min-lr 1e-08 --dropout 0.2 --lr-schedule fixed --optimizer adam \
    --arch ast_seq2seq --decoder-attention True --seed 666 --task translation \
    --skip-invalid-size-inputs-valid-test --sentence-avg --attention-type general \
    --learn-initial-state --criterion label_smoothed_cross_entropy --label-smoothing 0.1

It results in FileNotFoundError [Errno 2] No such file or directory: 'path/to/binarized/data/dict.npz.txt

The output of the binarization process however does not include the source language dictionary:

python FBK-Fairseq-ST/preprocess.py -s npz -t tok --format npz --inputtype audio \
--trainpref /path/non-binarized/data \
--destdir /path/binarized/data

Files in path/to/binarized/data/

---------------> train.npz-tok.idx
---------------> train.npz-tok.bin
---------------> train.npz-tok.npz.idx
---------------> train.npz-tok.npz.bin
---------------> dict.tok.txt    

It seems correct, as I have understood by reading Mattia Di Gangi's article on Medium: "we have a dictionary for the target language (dict.it.txt), and for each split of the data, an index and a content file for the source side (.h5.idx and .h5.bin) and the same for the target side (.it.idx and .it.bin)".

Then why does the script FBK-Fairseq-ST/fairseq/data/dictionary.py attempts to open dict.npz.txt (source language dict) ?

The problem arises also when using the MUstC English-Italian dataset (h5 instead of npz): FileNotFoundError: [Errno 2] No such file or directory: 'path/to/binarized/data/dict.h5.txt'

Giuseppe-Della-Corte commented 4 years ago

Digging up in the parameters list I found out this is not actually an issue. The --audio-input parameter has to be used when performing speech translation and this will prevent FBK-Fairseq-ST to attempts to load the source language dictionary.