lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
908 stars 205 forks source link

Default encoding change #1239

Open zzasdf opened 7 months ago

zzasdf commented 7 months ago

I encountered a change in the default encoding of Python when using lhotse to extract fbank features. Originally, the default encoding of python is "UTF-8", but after calling the function computefeatures, it becomes ANSI X3.4-1968, which causes an error in saving non English files. This situation happen when I run the code at the first time in a docker container, and won't happen when I run the code again. What could be the reason for this?

55a6826232fae95f0a624d34d124aac
zzasdf commented 7 months ago

Setting the environment variable PYTHONUTF8=1 can fix this problem

pzelasko commented 7 months ago

That's surprising, I don't think there's any code in lhotse itself that would change the encoding. If the env var works for you that sounds good.

AkagawaTsurunaki commented 6 months ago

Thanks a lot. Otherwise, I thought I had to edit the source code. But the change of encoding should be added into the notes of the doc to avoid confusion.