mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project
Apache License 2.0
158 stars 40 forks source link

Unicode Decode Error when running the LRS2 data preparation #1

Closed gak97 closed 1 year ago

gak97 commented 1 year ago

Thank you for providing the training code for the Auto AVSR.

I am facing an issue when trying to run the preprocess_lrs2lrs3.py file using the LRS2 dataset. I am seeing the below error:

Traceback (most recent call last): File "preprocess_lrs2lrs3.py", line 77, in text_transform = TextTransform() File "A:\Projects\auto_avsr\preparation\transforms.py", line 152, in init units = open(dict_path).read().splitlines() File "C:\Users\Girish\anaconda3\envs\autoavsr\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4416: character maps to

Any help to resolve this would be greatly appreciated!

mpc001 commented 1 year ago

Hi @gak97, for line 152 at file "A:\Projects\auto_avsr\preparation\transforms.py", can you please explicitly specify the encoding type for the file? Specifically, can you please try to change line 152 to units = open(dict_path, encoding='utf8').read().splitlines() to see if it works?

gak97 commented 1 year ago

Hi @mpc001, that resolved the error! Closing this issue here.