Closed guynich closed 3 months ago
When pseudo-labelling the Voxpopuli dataset the "raw_text" (needed for option --text_column_name) may be an empty string for some examples - see HF dataset model card here for an empty "raw_text" example.
--text_column_name
Question: how do I check which text name ("raw_text" or "normalized_text") was used when creating the pseudo-labelled datasets on HF, such as https://huggingface.co/datasets/distil-whisper/voxpopuli ?
Closing and moving the above information to https://github.com/huggingface/distil-whisper/issues/97.
When pseudo-labelling the Voxpopuli dataset the "raw_text" (needed for option
--text_column_name
) may be an empty string for some examples - see HF dataset model card here for an empty "raw_text" example.Question: how do I check which text name ("raw_text" or "normalized_text") was used when creating the pseudo-labelled datasets on HF, such as https://huggingface.co/datasets/distil-whisper/voxpopuli ?