Closed myd7349 closed 1 year ago
that seems like a good fix - I'll review it once I'm back from vacation. Thanks a lot!
One thing: could you include the test for new behavior in a unit test?
One thing: could you include the test for new behavior in a unit test?
Sure.
After some consideration, I have to admit that writing unit tests for the newly added code this time is a bit challenging. Let's take my PC as an example. My machine is running Windows 10, set to Simplified Chinese with the default local encoding being GBK/GB18030, and the console's default code page is CP936. In MSVC, when I call the C function fopen
to open a file, MSVC will interpret the file name using GBK/CP936. If the file name is passed from Python code and contains non-ASCII characters (such as CJK characters), in order to ensure that fopen
can open the file, it's necessary to encode the string from Python as CP936 and pass it to fopen
in C.
However, when it comes to the CI environment, its default encoding might not be GBK/CP936 or UTF-8. If the local encoding isn't UTF-8, my newly added test test_read_bytes
might fail. In fact, I should write test_read_bytes
like this:
signals2, _, _ = highlevel.read_edf(self.test_unicode_at_start_2.encode(locale.getpreferredencoding()))
But this introduces another issue: in some regions, their default encoding cannot encode Chinese characters. For instance, in cases where the default encoding is latin-1, "中文".encode('latin-1')
would result in an error.
In fact, regarding this issue, I have another solution in mind. It involves patching edflib.c
. We could create a function called fopen_utf8(const char *file)
, and then in the setup.py
file, when using MSVC, we can replace the fopen
function in edflib.c
with our fopen_utf8
function. This approach offers an alternative solution.
https://github.com/myd7349/pyedflib/commit/477694d0a6a92df359c6d92236181c1fede132e3
In fact, regarding this issue, I have another solution in mind. It involves patching
edflib.c
. We could create a function calledfopen_utf8(const char *file)
, and then in thesetup.py
file, when using MSVC, we can replace thefopen
function inedflib.c
with ourfopen_utf8
function. This approach offers an alternative solution.
that sounds like an interesting solution - feel free to open a PR with it
In fact, regarding this issue, I have another solution in mind. It involves patching
edflib.c
. We could create a function calledfopen_utf8(const char *file)
, and then in thesetup.py
file, when using MSVC, we can replace thefopen
function inedflib.c
with ourfopen_utf8
function. This approach offers an alternative solution. myd7349@477694dthat sounds like an interesting solution - feel free to open a PR with it
Hi! @skjerns I have just created a new PR.
I encountered a bug recently in the pyedflib library that affects Windows operating systems with non-ASCII characters in the file paths, especially in environments using CJK character sets. This issue arises when the system's default encoding is not UTF-8, but instead a locale-specific code page. Under such circumstances, pyedflib might fail to open files, rendering even solutions like
get_short_path_name
ineffective.Test script:
In my case,
get_short_path_name
workaround doesn't work anymore:The original file path:
C:\Users\myd\Desktop\带 Annotation 的 EDF 文件\S001R01.edf
. File path returned byget_short_path_name
:C:\Users\myd\Desktop\带ANNO~1\S001R01.edf
.As we can see, there are still non-ASCII characters in file path returned by
get_short_path_name
.To address this problem, I created this PR:
If the input file path is of type
bytes
, it's directly passed to the C interface. This approach is based on the assumption that users are likely to have a better understanding of their local encoding.Initially attempt to encode the file path using UTF-8 encoding.
If the UTF-8 encoded path fails to open, an attempt is made to use the locale-specific code page associated with the user's environment. This can be obtained using
locale.getpreferredencoding()
. For example, on a Simplified Chinese Windows system, this function would return cp936. This approach successfully resolves the majority of file opening failures on Windows systems.