labstreaminglayer / pylsl

Python bindings (pylsl) for liblsl
MIT License
142 stars 58 forks source link

Stream decodification Issue on Windows Machines with Accentuated Hostnames #69

Open Serpeve opened 1 year ago

Serpeve commented 1 year ago

Currently, on Windows operating systems, machines can have hostnames with accentuated characters. However, when a Stream is created on a machine that has a hostname with accentuated characters, an error occurs in liblsl when accessing the stream hostname. This is because the machine's hostname is not properly UTF-8 encoded.

To ensure compatibility and prevent errors, liblsl should be updated to handle Windows hostnames with accentuated characters.

Serpeve commented 1 year ago

This issue is likely to affect not only accentuated characters but also any non-UTF-8 encoded characters that are permitted in Windows hostnames. It would be ideal to implement a solution that addresses these potential cases as well, ensuring robust support for all valid Windows hostnames.

cboulay commented 1 year ago

Do you know if this is restricted to pylsl? If not then this should be made an issue on sccn/liblsl. If it's just pylsl then it can probably be fixed easily but it's hard to test for me. Would you be able to propose a solution?

Serpeve commented 1 year ago

The error that I get is: File "...\lib\site-packages\pylsl\pylsl.py", line 336, in hostname return lib.lsl_get_hostname(self.obj).decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 1: invalid continuation byte

It seems to be a problem related with how pylsl decodes the string of hostname, so I have tried a simple solution. It was to change .decode('utf-8') to .decode('latin1') in this particular line '336' of pylsl.py. It has worked for me when debugging. So the line should be changed to this code to perform previous behavior and to admit this particular case of accentuated words: try: return lib.lsl_get_hostname(self.obj).decode('utf-8') except UnicodeDecodeError: return lib.lsl_get_hostname(self.obj).decode('latin1')

If you do not see any other interferences of this change, it worked for an accentuated name.

I do not know if there would be other unexpected problems when combining latin words and characters of other languages, but this solved the particular case I mentioned.

UPDATE: After this change I also had to change line 373 to solve the same problem. The new code was the same solution as before:

try: return lib.lsl_get_xml(self.obj).decode('utf-8') except UnicodeDecodeError: return lib.lsl_get_xml(self.obj).decode('latin1')