topel / listen-attend-tell

Audio captioning system based on LAS, used in the DCASE2020 challenge
5 stars 2 forks source link

Can you please describe how the **clotho-dataset** directory look like? #1

Open jiminbot20 opened 4 years ago

jiminbot20 commented 4 years ago

Can you please describe how the clotho-dataset directory look like?

is this capture below right? image

And in your code, data split is not using?

topel commented 4 years ago

In ../clotho-dataset/data, there are these files:

characters_frequencies.p
words_list.p
words_frequencies.p
characters_list.p         

clotho_captions_evaluation.csv   
clotho_captions_development.csv  
clotho_metadata_development.csv  
clotho_metadata_evaluation.csv`

and these directories (among others):

clotho_dataset_dev/
clotho_dataset_eva/  

In those directories, there are the npy files containing the log-MEL features. To generate those, you need to process the WAV files with the scripts provided in this repository:

https://github.com/audio-captioning/clotho-dataset

In main_train.py, two splits are used:

In main_decode.py, you can use a trained model to generate captions on: