Could you please guide me, how should I prepare my dataset, so that I can run the experiment?
Current dataset structure is as follows:
Source language:
source1.wav
source1.txt (transcript of source1.wav)
source2.wav
source2.txt
....
Traget language
target1.txt ( translation of source1.txt)
target2.txt
....
I have gone through this tutorial too Getting Started with End-to-End Speech Translation. But, I could not understand how I should prepare or arrange my dataset as per FBK-Fairseq-ST requirement. Should I create a csv file and put the wav file names (source language) in the first column and the text (target language) in the next coulmn OR any other json/csv file that will keep track or map the audio and the text file.
As per the tutorial, I have to prepare a pre-trained ASR model first for FBK-Fairseq-ST.
I am new in this field, I be would thankful for any guidance.
Hi, thank you for providing the repository.
Could you please guide me, how should I prepare my dataset, so that I can run the experiment?
Current dataset structure is as follows:
Source language: source1.wav source1.txt (transcript of source1.wav) source2.wav source2.txt ....
Traget language target1.txt ( translation of source1.txt) target2.txt ....
I have gone through this tutorial too Getting Started with End-to-End Speech Translation. But, I could not understand how I should prepare or arrange my dataset as per FBK-Fairseq-ST requirement. Should I create a csv file and put the wav file names (source language) in the first column and the text (target language) in the next coulmn OR any other json/csv file that will keep track or map the audio and the text file.
As per the tutorial, I have to prepare a pre-trained ASR model first for FBK-Fairseq-ST.
I am new in this field, I be would thankful for any guidance.
Thank you.