Identifying source of an utterance

raotnameh / End-to-end-E2E-Named-Entity-Recognition-from-English-Speech

28 stars 14 forks source link

Identifying source of an utterance #8

Closed awasthiabhijeet closed 4 years ago

awasthiabhijeet commented 4 years ago

Hi, Nice work and thanks for releasing the dataset. Is there an easy way to identify the source dataset (librispeech, common-voice, tedlium etc.) of a given filename in txt directory? It would be even more helpful to associate each utterance with its speaker-id. This would be particularly helpful for re-purposing this dataset for the task of speaker adaptation.

Thanks!

raotnameh commented 4 years ago

@awasthiabhijeet Yea we can. I will release a .csv file for the same.

raotnameh commented 4 years ago

@awasthiabhijeet Sorry for the late reply. We have added files to identify the source of an utterance from the source dataset. Files are at E2E_NER/data/ner/source_given_filename.json and E2E_NER/data/ner/source.json

source_given_filename.json: It has the file names which are present in the individual source dataset for easy comparision.
source.json: It has the links to download the source datasets.

Let me know, if we can provide anything else. Thanks for your interest in the dataset.