roebel / pysndfile

Python interface to libsndfile (http://www.mega-nerd.com/libsndfile/)
5 stars 4 forks source link

pysndfile is unable to access filenames with non-latin-1 chars on windows #3

Closed sveinse closed 5 years ago

sveinse commented 5 years ago

I'm having problems opening sound files with non-latin-1 characters in them on Windows:

import os, sys, pysndfile
print(pysndfile.PySndfile('Bublé.wav'))

This fails with:

Traceback (most recent call last):
  File ".\filename_test.py", line 6, in <module>
    print(pysndfile.PySndfile('Bublé.wav'))
  File "_pysndfile.pyx", line 676, in _pysndfile.PySndfile.__cinit__
OSError: PySndfile::error while opening b'Bubl\xc3\xa9'
        ->b'System error : The system cannot find the file specified.\r\n'

I've linked pysndfile with the official windows library release of libsndfile.

I think the cause is due to this code: https://github.com/roebel/pysndfile/blob/master/_pysndfile.pyx#L670-L673 which calls https://github.com/erikd/libsndfile/blob/master/src/sndfile.hh#L82 . From what I have figured out, Windows doesn't natively do utf-8, so the existing assumption of using utf-8 is wrong. I think it needs to call the LPCWSTR wpath variant on Windows. Correspondingly the define ENABLE_SNDFILE_WINDOWS_PROTOTYPES must be enabled and figuring out how to generate LPCWSTR strings from a python str object.

I am currently working on investigating these issues and hopefully find a solution. I'm not very well-versed in cython or in windows, but I'm hoping that I am able, since I depend on having this fixed. I am aware that @roebel has disclamed not being a Windows user.

roebel commented 5 years ago

Hello, I would be interested to hear more about how you go about usng pysndfile on windows and I will certainly be happy to make the necessary changes in pysndfile.

A few comments:

Clearly we need ENABLE_SNDFILE_WINDOWS_PROTOTYPES to be set, this could be a configuration in setup.py setting a compiler macro, but also a condition in pysndfile.hh (pysndfile does not use sndfile.hh directly).

You may try adding

#ifdef WIN32
#define ENABLE_SNDFILE_WINDOWS_PROTOTYPES 1
#endif

directly into pysndfile.hh

Further to generate the LPCWSTR in cython I think all is said here

https://stackoverflow.com/questions/19650195/cython-convert-unicode-string-to-wchar-array

one probably would also do this conditionally for all windows compilers in _pysndfile.pyx

Here a mechanism that would help you to have windows specific code in _pysndfile.pyx https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#conditional-compilation

I have made a proposal of the required changes as a starting point

_pysndfile_win32.pyx.txt

it would probably be better to set
ENABLE_SNDFILE_WINDOWS_PROTOTYPES in _pysndifle.pyx as well and to avoid changing two files that may at some point become inconsistent. May be this would do it all

IF UNAME_SYSNAME == "Windows":
    DEF ENABLE_SNDFILE_WINDOWS_PROTOTYPES=1
    from libc.stddef cimport wchar_t

    cdef extern from "Python.h":
       wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *)
       void PyMem_Free(void *p)
roebel commented 5 years ago

I check that - the last point does not work, you would need to do it like this before the include of pysndfile.hh

IF UNAME_SYSNAME == "Windows":
   cdef extern from *:
         """
         #define  ENABLE_SNDFILE_WINDOWS_PROTOTYPES 1
         """
sveinse commented 5 years ago

Thanks!

With your directions and hints, I have successfully been able to build a running version on Windows. https://github.com/sveinse/pysndfile/commit/85d91cff5dd5e26d8d095e2e1a01b429364a271b

The only thing I wasn't able to fix was the declaration of the windows variant of the handle in L115. It should read:

SndfileHandle(LPCWSTR wpath, int mode, int format, int channels, int samplerate)

However I failed to insert a IF statement among that cdef cppclass SndfileHandle block. Perhaps you know a way to do this without having to redefine the whole block?

Also note the change on L690: I believe the os.path.expanduser() can be used to replace your two lines of tests and modifications.

PS! As we already have, you're free to use these suggestions freely under your existing copyright license.

roebel commented 5 years ago

Thanks for the info, that sounds really good.

Thanks as well for the suggestion with expanduser that's certainly much better.

Concerning the remaining problem, after somewhat more searching I think now that it is not possible to insert an IF in the middle of a cppclass declaration. Sorry, I was too optimistic. It appears the correct way of doing this is adding two include files (.xpi) one for linux and the other for windows and then include the correct one into the .pyx file.

I have created a win32 branch with these changes, this could become version 1.3.4, please let me now whether it works for you.

It would also interesing to know how you run the setup.py such that it finds your dll. I would add these explanations into the README for others to follow.

roebel commented 5 years ago

You find the experimental support of windows in pysndfile 1.3.4. I cannot test this, so don't hesitate to share your observations.

sveinse commented 5 years ago

Sorry for the delay, its summer vacation time around here.

When I check out 1.3.4 from master and build on windows I get the following error message:

+ winpty /c/Svein/Temp/pysndfile/venv-pysndfile/Scripts/pip wheel . --no-deps --global-option=build_ext --global-option=-L/c/Svein/Temp/pysndfile/venv-pysndfile/lib --global-option=-L/c/Svein/Temp/pysndfile/venv-pysndfile/include
c:\svein\temp\pysndfile\venv-pysndfile\lib\site-packages\pip\_internal\commands\wheel.py:107: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options /
--install-options.
  cmdoptions.check_install_build_global(options)
Processing c:\svein\temp\pysndfile\pysndfile
  Installing build dependencies ... done
    Complete output from command python setup.py egg_info:
    C:\Users\Svein\AppData\Local\Temp\pip-build-env-sf3dmy87\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2).
This will change in a later release! File: C:\Users\Svein\AppData\Local\Temp\pip-req-build-k0uiixms\_pysndfile.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
                    filename = bytes(filename, "UTF-8")
                self.filename = filename

                IF UNAME_SYSNAME == "Windows":
                    if length > 0:
                        self.thisPtr = new SndfileHandle(my_wchars, sfmode, format, channels, samplerate)
                                                        ^
    ------------------------------------------------------------

    _pysndfile.pyx:687:53: Cannot assign type 'wchar_t *' to 'const char *'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Svein\AppData\Local\Temp\pip-req-build-k0uiixms\setup.py", line 41, in <module>
        ext_modules = cythonize(ext_modules, force=compile_for_RTD )
      File "C:\Users\Svein\AppData\Local\Temp\pip-build-env-sf3dmy87\Lib\site-packages\Cython\Build\Dependencies.py", line 1096, in cythonize
        cythonize_one(*args)
      File "C:\Users\Svein\AppData\Local\Temp\pip-build-env-sf3dmy87\Lib\site-packages\Cython\Build\Dependencies.py", line 1219, in cythonize_one
        raise CompileError(None, pyx_file)
    Cython.Compiler.Errors.CompileError: _pysndfile.pyx
    Compiling _pysndfile.pyx because it changed.
    [1/1] Cythonizing _pysndfile.pyx

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\Svein\AppData\Local\Temp\pip-req-build-k0uiixms\

I haven't made any investigations into why yet, work in progress.

PS! I've created a gist of an extract of the scripts I use for building and bundling pysndfile and libsndfile on Windows. https://gist.github.com/sveinse/97411b95d36a6b8c430d4d381b620ecb -- this is admittedly a cumbersome procedure. For my application, I've chosen to pre-package the binary libsndfile (using pysndfile_library.sh) and make a pysndfile wheel (using pysndfile_build.sh) and collect them into a separate binary repo. This way it is much simpler to reinstall the python venv and install pysndfile from wheel without needing all of the infrastructure of building the library and modules. See pysndfile_install.sh for that.

roebel commented 5 years ago

Thanks for trying, in fact due to two problems with copy pasting 1.3.4 was completely broken, not only for windows but also for linux. I fixed a problem in the linux code and another problem in the windows part in 1.3.5. This should help for the overload problem you had in your built. In fact I simulated this under linux and had the same problem which is now gone in 1.3.5.

I also fixed my test script such that I can check directly the new build which will hopefully prevent thee kind of errors in the future.

Thanks for the gist, if you don't mind I would include that into the readme of pysndfile for others to follow.

sveinse commented 5 years ago

Thank you, this seems to work better. At least it does progress a little further, but I am currently running into the pesky unicode/codepage problems. Working on identifying the cause:

+ winpty /c/Svein/Temp/pysndfile/venv-pysndfile/Scripts/pip wheel . --no-deps --global-option=build_ext --global-option=-L/c/Svein/Temp/pysndfile/venv-pysndfile/dist/lib --global-option=-L/c/Svein/Temp/pysndfile/venv-pysndfile/dist/include
c:\svein\temp\pysndfile\venv-pysndfile\lib\site-packages\pip\_internal\commands\wheel.py:107: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
  cmdoptions.check_install_build_global(options)
Processing c:\svein\temp\pysndfile\pysndfile
  Installing build dependencies ... done
    Complete output from command python setup.py egg_info:
    Compiling _pysndfile.pyx because it changed.
    [1/1] Cythonizing _pysndfile.pyx
    C:\Users\Svein\AppData\Local\Temp\pip-build-env-gg_gq_3t\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\Svein\AppD
ata\Local\Temp\pip-req-build-ebmjjtlt\_pysndfile.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Svein\AppData\Local\Temp\pip-req-build-ebmjjtlt\setup.py", line 181, in <module>
        long_description = update_long_descr(),
      File "C:\Users\Svein\AppData\Local\Temp\pip-req-build-ebmjjtlt\setup.py", line 148, in update_long_descr
        for ind, line in enumerate(infile):
      File "C:\Users\Svein\AppData\Local\Programs\Python\Python37-32\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 417: character maps to <undefined>
    setup.py::error:: pandoc command failed. Cannot update LONG_DESCR.txt from modified README.md[WinError 2] The system cannot find the file specified

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\Svein\AppData\Local\Temp\pip-req-build-ebmjjtlt\
sveinse commented 5 years ago

With regards to the gist, please go ahead. Please note that I have a similar procedure for Linux, albeit somewhat different. Windows always looks for depending libraries in the same dir as the file, so putting the libsndfile.dll next to the pysndfile site-packages dir works fine. On Linux however, the so loader doesn't by default look in the same directory, so for Linux I'm modifying rpath on the pysndfile binary to force it to find the injected libsndfile.so file.

roebel commented 5 years ago

Ok, the unicode probelm is not a real problem, this is only for generating the LONG description in case the README changed, normally I test the modification time and run this only in case README.md is newer than LONG_DESCR, but it seems the test is not working under windows. I will handle this more gracefully and update the git repos.

roebel commented 5 years ago

I updated the repos with a new setup.py that no longer tests file dates but uses some sort of checksum to test whether the reconstruction of the LONG_DESCR is necessary. That also prevents a possible build dependency on pandoc that was in fact not desired.

roebel commented 5 years ago

With regards to the gist I, will link the stuff from the README. But I don't exactly understand what you want to do with that on linux. Why would you not simply install libsndfile in the system and use that one?

sveinse commented 5 years ago

I update to master, and it still fails on unicode errors. Line 186 in setup.py. My temporary fix is to always use open(..., encoding='utf-8') in all places. I assume it is because the LONG_DESCR file has a utf-8 symbol in it somewhere, and for windows utf-8 isn't standard. py3 is quite pedantic about these things.

Despite this, I'm still having problems compiling the c code. I just started debugging why it occurs. My theory is that the c compiler isn't picking up the proper windows overloaded variant of SndFileHandle(). I am going to check if the extra windows define is set like it should. Work in progress.

As for the gist: The main reason is that I'd like to have the venv fully self contained. For windows I use pyinstaller to make distributable exe with the whole project, but I don't do that for Linux. I know this is beyond the concepts of what the venv intends to cover, but it still is useful. Very many different distros have very different libraries available, so that is a clutter too.

sveinse commented 5 years ago

The following patch seems to fix the utf-8 issue for me.

diff --git a/setup.py b/setup.py
index c93e2ca..6f479d4 100755
--- a/setup.py
+++ b/setup.py
@@ -21,6 +21,7 @@ try :
 except Exception:
     pass

+enc = 'utf-8'

 compile_for_RTD =  "READTHEDOCS" in os.environ

@@ -183,7 +184,7 @@ def update_long_descr():
                     outfile.write(line)

             write_readme_checksum(crdck[0], crdck[1])
-    return open(LONG_DESCR_path).read()
+    return open(LONG_DESCR_path, encoding=enc).read()

 with open('./requirements.txt') as f:

PS! Its really great that the module got a setup which is self contained on build dependencies, but having cython and numpy in pyproject.toml required=, makes my build spend 3 minutes just downloading and recompiling them each time the pysndfile is built. It doesn't even care if it is pre-installed in the venv beforehand, as it has to install it in its build environment. But this is how it should be, so I suppose this is just an effect of how slow Windows can be on certain ops. Its not your fault, so not a rant directed at you :D

sveinse commented 5 years ago

I think I found it by looking into the generated _pysndfile.cpp file. cython is apparently including pysndfile.hh before the ENABLE_SNDFILE_WINDOWS_PROTOTYPES is defined, and subsequently the windows function isn't declared properly.

image

I'm not sure how you can let cython place the define before including pysndfile.hh thou...

from libc.stddef cimport wchar_t

cdef extern from *:
    """
    #define ENABLE_SNDFILE_WINDOWS_PROTOTYPES 1
    """

cdef extern from "Windows.h":
    ctypedef const wchar_t *LPCWSTR

cdef extern from "Python.h":
    wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *)
    void PyMem_Free(void *p)

cdef extern from "pysndfile.hh":
    cdef cppclass SndfileHandle :
roebel commented 5 years ago

That the LONG_DESCR is not ASCII seems to be a bug in pandoc. I don't think it is a good idea to have UTF8 chars in the RST file, I replace them in the setup now.

I also moved the define to the top of the _pysndfile.pyx, I had concentrated all windows stuff in the sndfile_win32.pxi, this was apparently not a good idea.

You find all this in the repos.

With respect to the compile time due to the dependency handling: I fully agree, I had the same problem when I tried this, however when I tried with preinstalled numpy and cython it did not search for dependencies, that's why I thought this would not be so bad. Which version of pip do you use ?

sveinse commented 5 years ago

I think I have a workable solution. See the below patch for a summary of all changes.

My comments to the patch:

Thank you.

diff --git a/_pysndfile.pyx b/_pysndfile.pyx
index e225f45..6fc59f2 100644
--- a/_pysndfile.pyx
+++ b/_pysndfile.pyx
@@ -56,6 +56,21 @@ max_supported_string_length = dict(_max_supported_string_length_tuple)
    we ensure these limits during writing to be able to read the strings back 
 """

+IF UNAME_SYSNAME == "Windows":
+    from libc.stddef cimport wchar_t
+
+    cdef extern from "Windows.h":
+        ctypedef const wchar_t *LPCWSTR
+
+    cdef extern from "Python.h":
+       wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *)
+       void PyMem_Free(void *p)
+
+    cdef extern from *:
+        """
+        #define ENABLE_SNDFILE_WINDOWS_PROTOTYPES 1
+        """
+
 cdef extern from "numpy/arrayobject.h":
     void PyArray_ENABLEFLAGS(cnp.ndarray arr, int flags)

diff --git a/setup.cfg b/setup.cfg
index ce20cba..33af850 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -7,8 +7,8 @@ tag_svn_revision = 0
 # set these if your libsndfile is installed in custom directories
 # alternatively use --sndfile-libdir= and --sndfile-incdir= 
 # command line options for build_ext
-sndfile-libdir = /u/formes/share/packages/macports/lib
-sndfile-incdir = /u/formes/share/packages/macports/include
+#sndfile-libdir = /u/formes/share/packages/macports/lib
+#sndfile-incdir = /u/formes/share/packages/macports/include

 [build_sphinx]
 source-dir = doc/source
diff --git a/setup.py b/setup.py
index c93e2ca..6f479d4 100755
--- a/setup.py
+++ b/setup.py
@@ -21,6 +21,7 @@ try :
 except Exception:
     pass

+enc = 'utf-8'

 compile_for_RTD =  "READTHEDOCS" in os.environ

@@ -183,7 +184,7 @@ def update_long_descr():
                     outfile.write(line)

             write_readme_checksum(crdck[0], crdck[1])
-    return open(LONG_DESCR_path).read()
+    return open(LONG_DESCR_path, encoding=enc).read()

 with open('./requirements.txt') as f:
diff --git a/sndfile_linux.pxi b/sndfile_linux.pxi
index 95ef292..f38439f 100644
--- a/sndfile_linux.pxi
+++ b/sndfile_linux.pxi
@@ -1,4 +1,3 @@
-
 cdef extern from "pysndfile.hh":
     cdef cppclass SndfileHandle :
         SndfileHandle(const char *path, int mode, int format, int channels, int samplerate)
diff --git a/sndfile_win32.pxi b/sndfile_win32.pxi
index 4ee5230..df2fce0 100644
--- a/sndfile_win32.pxi
+++ b/sndfile_win32.pxi
@@ -1,18 +1,3 @@
-from libc.stddef cimport wchar_t
-
-cdef extern from *:
-    """
-    #define ENABLE_SNDFILE_WINDOWS_PROTOTYPES 1
-    """
-
-cdef extern from "Windows.h":
-    ctypedef const wchar_t *LPCWSTR
-
-cdef extern from "Python.h":
-    wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *)
-    void PyMem_Free(void *p)
-
-
 cdef extern from "pysndfile.hh":
     cdef cppclass SndfileHandle :
         SndfileHandle(const char *path, int mode, int format, int channels, int samplerate)
roebel commented 5 years ago

Yes that's about how I did it, just not moving all the windows stuff to the pyx file, only the define.

sveinse commented 5 years ago

I'm not sure thats enough. The #include "Windows.h" must be located before any pysndfile.hh is included, so that needs to go to the top AFAICS.

roebel commented 5 years ago

Sorry, yes I see, I updated now to have your version.

I made a few more checks with the depencies. Indeed pip does not take into account installed versions, but if you use

pip install  --no-build-isolation pysndfile

then it did use the packages I had already installed.

sveinse commented 5 years ago

Great tip, thanks! I've built and tested it, and it works! Thank you. The only remaining blocker on Windows is the fix to setup.cfg. If it is left as-is the following error is given:

warning: I don't know what to do with 'runtime_library_dirs': ['/u/formes/share/packages/macports/lib']
error: don't know how to set runtime library search path for MSVC
roebel commented 5 years ago

Ok, that are old entries that I don't need anymore, I removed those now. If you don't come back with any problems I will close this issue and publish a new version. Thanks a lot for your feedback and help to make this run on windows, that's quite a big step.

sveinse commented 5 years ago

Everything is working. Closing the issue.

I know that windows is not a platform you have access to, and thus it is easy to neglect or ignore. I thank you for the willingness to get this up and running on windows. Thank you!