seomoz / simhash-py

Simhash and near-duplicate detection
MIT License
408 stars 115 forks source link

Installation instructions not working #32

Closed rth closed 7 years ago

rth commented 7 years ago

The readme currently indicates that simhash-py can be installed with,

pip install simhash-py

however there is no such package at https://pypi.python.org/pypi and this command fails with "No matching distribution found for simhash-py".

Besides the latest releases at https://github.com/seomoz/simhash-py/releases are from 2012.

I am currently trying to submit simhash-py to https://conda-forge.github.io/ so that it could be installed using,

conda install -c conda-forge simhash-py

without the need of a compiler.

I can't find any official URL from which a .tar.gz with simhash-py can be downloaded (except for cloning the github repo)...

Would it be possible to upload it to PyPi , or at least push a tag to create a new release on Github? Thank you!

b4hand commented 7 years ago

Yes, the instructions definitely appear to be wrong. I believe we actually install directly from GitHub using a GitHub URL. There is some complication to publishing on PyPi since there is already a package called simhash. Additionally, the module name for this project is currently simhash, so it seems like it might be a bad idea to publish a package called simhash-py, but that uses import simhash.

b4hand commented 7 years ago

I know this isn't what you asked for, but as a temporary workaround, I've updated the README instructions to point directly at GitHub for the installation process in #33.

rth commented 7 years ago

There is some complication to publishing on PyPi since there is already a package called simhash. Additionally, the module name for this project is currently simhash, so it seems like it might be a bad idea to publish a package called simhash-py, but that uses import simhash

Yes, I understand that's not simple, not sure what's best way to address that issue.. Still it's good to have some package on pypi so that people find both packages when searching on google or on pypi.

Also, would it be possible for you to create a new release (e.g. 0.2.0) on Github? For instance by pushing a new tag v0.2.0) which would create an official .tar.gz with a fixed version that I could then use for the conda-forge package.. That would be much appreciated.. Thanks.

b4hand commented 7 years ago

Yes, I was going to bump the release version since I'm not sure how old 0.2.0 is and whether a release of it was ever made (possibly internally only). Regardless, it seems likely that it's time for a new version number.

rth commented 7 years ago

Definitely, Thanks a lot!

b4hand commented 7 years ago

Version 0.3.0 is released:

https://github.com/seomoz/simhash-py/releases/tag/v0.3.0

I'm going to create a separate ticket for the pypi release package and close this one out.

rth commented 7 years ago

@b4hand Actually there is an issue with the v0.3.0 .tar.gz in the release section: it does not currently contain the C++ code and

pip install https://github.com/seomoz/simhash-py/archive/v0.3.0.tar.gz

would not work for that reason.

I imagine this might have something to do with the use of git submodules by simhash-py, and it looks like tagging is not enough to make a complete release.. Maybe manually including the source code produced by python setup.py sdist could be a solution (also related to #36)? Thanks!

b4hand commented 7 years ago

This works for me on a cleanly created vagrant ubuntu image:

pip install 'git+git://github.com/seomoz/simhash-py.git@v0.3.0'

It appears that the GitHub produced tarballs do not include the git submodule for simhash-cpp.

Also, the https flavor works as well if that is preferred:

pip install git+https://github.com/seomoz/simhash-py.git@v0.3.0
rth commented 7 years ago

@b4hand Thanks, that would work. Github doesn't seem to integrate submodules very well..