Training with custom dataset

Hi, thank you for providing the repository.

Could you please guide me, how should I prepare my dataset, so that I can run the experiment?

Current dataset structure is as follows:

Source language: source1.wav source1.txt (transcript of source1.wav) source2.wav source2.txt ....

Traget language target1.txt ( translation of source1.txt) target2.txt ....

I have gone through this tutorial too Getting Started with End-to-End Speech Translation. But, I could not understand how I should prepare or arrange my dataset as per FBK-Fairseq-ST requirement. Should I create a csv file and put the wav file names (source language) in the first column and the text (target language) in the next coulmn OR any other json/csv file that will keep track or map the audio and the text file.

As per the tutorial, I have to prepare a pre-trained ASR model first for FBK-Fairseq-ST.

I am new in this field, I be would thankful for any guidance.

Thank you.

mattiadg / FBK-Fairseq-ST

Training with custom dataset #14