lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
902 stars 204 forks source link

'ascii' codec can't encode characters in position 219-247 in processing wenet speech dateset #1346

Closed wgfi110 closed 1 month ago

wgfi110 commented 1 month ago

Hello,

I have encountered an issue while preparing the Wenet speech dataset using Icefall:

Details:

Version: lhotse 1.24.0.dev0+git.bbb3fccd.clean Working Directory: icefall/tree/master/egs/wenetspeech/KWS/ Step: Executing bash prepare.sh Error Location:

compute_fbank_wenetspeech_dev_test(args) File "/slow_nfs/xxxx/icefall/egs/wenetspeech/ASR/./local/compute_fbank_wenetspeech_dev_test.py", line 114, in compute_fbank_wenetspeech_dev_test cut_set.to_file(cuts_path) File "/opt/conda/lib/python3.9/site-packages/lhotse/serialization.py", line 532, in to_file store_manifest(self, path) File "/opt/conda/lib/python3.9/site-packages/lhotse/serialization.py", line 517, in store_manifest manifest.to_jsonl(path) File "/opt/conda/lib/python3.9/site-packages/lhotse/serialization.py", line 300, in to_jsonl save_to_jsonl(self.to_dicts(), path) File "/opt/conda/lib/python3.9/site-packages/lhotse/serialization.py", line 126, in save_to_jsonl print(json.dumps(item, ensure_ascii=False), file=f) UnicodeEncodeError: 'ascii' codec can't encode characters in position 219-247: ordinal not in range(128)

TinaChen95 commented 3 weeks ago

may I know how you solve the problem?