htrc / htrc-feature-reader

Tools for working with HTRC Feature Extraction files
39 stars 12 forks source link

Installation on Windows #38

Open younbaek opened 3 years ago

younbaek commented 3 years ago

I am having trouble installing this module on windows. I get the following error:

pip install htrc-feature-reader

Collecting htrc-feature-readerNote: you may need to restart the kernel to use updated packages.
  Using cached htrc-feature-reader-2.0.7.tar.gz (58 kB)

 ERROR: Command errored out with exit status 1:
     command: 'C:\Users\syoun\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\syoun\\AppData\\Local\\Temp\\pip-install-hibyj9g0\\htrc-feature-reader\\setup.py'"'"'; __file__='"'"'C:\\Users\\syoun\\AppData\\Local\\Temp\\pip-install-hibyj9g0\\htrc-feature-reader\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\syoun\AppData\Local\Temp\pip-pip-egg-info-4kmgp8vi'
         cwd: C:\Users\syoun\AppData\Local\Temp\pip-install-hibyj9g0\htrc-feature-reader\
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\syoun\AppData\Local\Temp\pip-install-hibyj9g0\htrc-feature-reader\setup.py", line 4, in <module>
        long_description = fh.read()
    UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 17387: illegal multibyte sequence
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
bmschmidt commented 3 years ago

Thanks @younbaek . This is a student in a class I'm teaching and couldn't immediately help.

Best I can tell, the problem somehow stems from installing using Windows with a Korean character set as the system encoding. https://en.wikipedia.org/wiki/Unified_Hangul_Code. Not sure if this is a feature-reader specific problem or a python one more generally. But figured it would be worth having the issue on file.

pip install is being run from inside ipython--don't know if that makes a difference.

drjiangyang commented 3 years ago

I encountered the same Unicode error on my Chinese character set Windows 10.

` ERROR: Command errored out with exit status 1: command: 'c:\program files\python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Young\AppData\Local\Temp\pip-install-u50hlow1\htrc-feature-reader_5cad234b66b0451e946d35396fe7a955\setup.py'"'"'; file='"'"'C:\Users\Young\AppData\Local\Temp\pip-install-u50hlow1\htrc-feature-reader_5cad234b66b0451e946d35396fe7a955\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Young\AppData\Local\Temp\pip-pip-egg-info-pwe81pwi' cwd: C:\Users\Young\AppData\Local\Temp\pip-install-u50hlow1\htrc-feature-reader_5cad234b66b0451e946d35396fe7a955\ Complete output (5 lines): Traceback (most recent call last): File "", line 1, in File "C:\Users\Young\AppData\Local\Temp\pip-install-u50hlow1\htrc-feature-reader_5cad234b66b0451e946d35396fe7a955\setup.py", line 4, in long_description = fh.read() UnicodeDecodeError: 'gbk' codec can't decode byte 0xa0 in position 17389: illegal multibyte sequence

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.`

organisciak commented 3 years ago

Hmm, interesting. I'm guessing that it's an issue when I read the readme file, since I use open("README.md", "r") without specifying the encoding. I think adding encoding='utf8' to the open parameters will fix it.

I'll on parental leave right now, so my bandwidth is tight; I'll push out that fix (if that is the problem) when I have a moment.

In the meantime, if anybody has the ability to test if that's the problem, I would appreciate it. Essentially, it would involve cloning the git repository and seeing if you can run the following code from the folder:

with open("README.md", mode="r", encoding='utf-8') as fh:
    long_description = fh.read()

If this is the issue, then the code above will work but if you remove encoding='utf-8' it will crash.

bmschmidt commented 3 years ago

Ah, @organisciak is right. See here:

For example, long_description = open("README.md").read() in setup.py is a common mistake. Many Windows users can not install the package if there is at least one non-ASCII character (e.g. emoji) in the README.md file which is encoded in UTF-8.