mt-upc / iwslt-2021

Systems submitted to IWSLT 2021 by the MT-UPC group.
MIT License
14 stars 4 forks source link

The IWSLT test sets have moved, yaml files no longer available. #4

Open bhaddow opened 1 year ago

bhaddow commented 1 year ago

Hi

The IWSLT test sets are no longer in the location given in the README, and they no longer have a yaml file. I presume that this has been replaced by the xml files containing the transcription/translation.

The IWSLT test sets are at http://i13pc106.ira.uka.de/~jniehues/IWSLT-SLT/data/eval/en-de/

best Barry

bhaddow commented 1 year ago

Hi

Actually, the segmented versions contain the yaml files, and they are available for 2019 and 2020. To prepare the tsv, run

python $IWSLT_ROOT/scripts/prepare_iwslt_tst.py --test-dir-root $IWSLT_TEST_ROOT/IWSLT.tst2020 

The arguments are slightly different to the README

best Barry

gegallego commented 1 year ago

Hi Barry,

Thanks for taking the time to submit this issue. I have solved it in https://github.com/mt-upc/iwslt-2021/commit/3f03765e64b41d2ef3d203656154c6c2a91bfcdb.

By the way, what happened with the segmented version of tst2021? It was available on the previous website, right?

Best, Gerard

bhaddow commented 1 year ago

Hi Gerard

Thanks! Yes, I noticed that tst2021 was missing, but I do not know what has happened to it. Jan Niehues may know - it looks like he hosts the test sets.

best Barry

gegallego commented 1 year ago

Thanks, Barry!

gegallego commented 1 year ago

Hi @jniehues-kit,

Do you have the segmented version of tst2021? Could you make it available on the new website?

Thanks!