piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.56k stars 4.37k forks source link

python 3.5 support #554

Closed anupamme closed 8 years ago

anupamme commented 8 years ago

I downloaded the latest code and ran (after cd into the directory)

python3.5 setup.py test

and I get an error:

File "/home/mediratta/gensim/gensim/parsing/preprocessing.py", line 10, in <module>
    from gensim import utils
  File "/home/mediratta/gensim/gensim/utils.py", line 49, in <module>
    from smart_open import smart_open
  File "/usr/local/lib/python3.5/dist-packages/smart_open/__init__.py", line 1, in <module>
    from .smart_open_lib import *
  File "/usr/local/lib/python3.5/dist-packages/smart_open/smart_open_lib.py", line 35, in <module>
    from boto.compat import BytesIO, urlsplit, six
  File "/usr/local/lib/python3.5/dist-packages/boto/__init__.py", line 1216, in <module>
    boto.plugin.load_plugins(config)
  File "/usr/local/lib/python3.5/dist-packages/boto/plugin.py", line 92, in load_plugins
    for file in glob.glob(os.path.join(directory, '*.py')):
  File "/usr/lib/python3.5/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.5/genericpath.py", line 144, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'NoneType'

Am I missing something here? Or gensim is not supported for python3.5?

piskvorky commented 8 years ago

Python 3.5 is supported and works (part of Travis CI test suite).

This seems to be some issue with boto, a 3rd party lib brought in via gensim's dependency on smart_open.

What version of boto are you using? Anything unusual during its installation? What is the output of pip freeze?

anupamme commented 8 years ago

boto==2.38.0

output of uname -a:

Linux instance-8 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linu

output of pip freeze:

-lxc==0.1, beautifulsoup4==4.3.2, blinker==1.3, boto==2.38.0, bz2file==0.98, cffi==1.1.2, characteristic==14.3.0, chardet==2.3.0, cloud-init==0.7.7, command-not-found==0.3, configobj==5.0.6, cryptography==1.0.1, cssselect==0.9.1, decorator==4.0.2, googlemaps==2.4.1, html5lib==0.999, httplib2==0.9.2, httpretty==0.8.6, idna==2.0, Jinja2==2.8, jsonpatch==1.3, jsonpointer==1.0, language-selector==0.1, lxml==3.4.4, MarkupSafe==0.23, nltk==3.1, numpy==1.8.2, oauthlib==1.0.0, ply==3.7, prettytable==0.7.2, pyasn1==0.1.8, pyasn1-modules==0.0.8, pycparser==2.14, pycurl==7.19.5.1, pygobject==3.16.2, PyJWT==1.0.0, pyOpenSSL==0.15.1, pyserial==2.7, python-apt==1.0.1, python-debian==0.1.27, PyYAML==3.11, queuelib==1.4.2, requests==2.8.1, scipy==0.14.1, Scrapy==1.0.3, service-identity==14.0.0, six==1.9.0, smart-open==1.3.0, ssh-import-id==4.5, stanford-corenlp-pywrapper==0.1.0, Twisted==15.5.0, ufw==0.34, unattended-upgrades==0.1, urllib3==1.11, w3lib==1.11.0, wheel==0.26.0, zope.interface==4.1.3

anupamme commented 8 years ago

Forgot to add: Installation goes fine but when I do import gensim I get the same error:

Traceback (most recent call last): File "", line 1, in File "/home/mediratta/gensim/gensim/init.py", line 6, in from gensim import parsing, matutils, interfaces, corpora, models, similarities, summarization File "/home/mediratta/gensim/gensim/parsing/init.py", line 7, in from .preprocessing import File "/home/mediratta/gensim/gensim/parsing/preprocessing.py", line 10, in from gensim import utils File "/home/mediratta/gensim/gensim/utils.py", line 49, in from smart_open import smart_open File "/usr/local/lib/python3.5/dist-packages/smart_open/init.py", line 1, in from .smart_open_lib import File "/usr/local/lib/python3.5/dist-packages/smart_open/smart_open_lib.py", line 35, in from boto.compat import BytesIO, urlsplit, six File "/usr/lib/python3/dist-packages/boto/init.py", line 1216, in boto.plugin.load_plugins(config) File "/usr/lib/python3/dist-packages/boto/plugin.py", line 92, in load_plugins for file in glob.glob(os.path.join(directory, '.py')): File "/usr/lib/python3.5/posixpath.py", line 90, in join genericpath._check_arg_types('join', a, p) File "/usr/lib/python3.5/genericpath.py", line 144, in _check_arg_types (funcname, s.class.name)) from None TypeError: join() argument must be str or bytes, not 'NoneType'

anupamme commented 8 years ago

curious if we can replace the use of boto(https://github.com/boto/boto) by boto3(https://github.com/boto/boto3)

I am able to use boto3 within python3.5 whereas I was not able to use boto.

Will be happy to create a PR if someone can endorse that this is the right direction?

piskvorky commented 8 years ago

Hmm, interesting. The best way to debug is probably to inspect this line:

File "/usr/lib/python3/dist-packages/boto/plugin.py", line 92, in load_plugins
for file in glob.glob(os.path.join(directory, '*.py')):

and see why directory is None in Python 3.5.

But I still don't understand why it works on py3.5 on Travis, but fails on your Ubuntu. If there's a bug in boto, I'd expect it to fail on both.

Thanks for your offer to migrate to boto3 @anupamme ! Gensim doesn't use boto directly, so this is more relevant in the smart_open project. I opened a migration ticket there https://github.com/piskvorky/smart_open/issues/43 -- your help would be very welcome!

piskvorky commented 8 years ago

Did you get a chance to debug boto @anupamme ?

I'm closing this now, but if find something that could be fixed on the gensim side, feel free to comment / reopen and will look into it.