Closed flppgg closed 5 years ago
Hello, this is really great idea which we discussed but never had time to implement. However please make sure all checks pass and make it like an extra class not a complete replacement of the current network x algorithm.
I like the idea! I believe this might be an attractive option for making CW quick. I recommend paying attention to the following:
master
because the API has changed since 0.2
(and do not commit anything related to the versioning).nx.Graph
structure.Just out of curiosity: CW is a fast algorithm and the current implementation should work fine in most cases. How large are your graphs? What is the difference in running time?
Sorry, I will be on vacation next week far away from the keyboard.
Ok thank you for your comments! _I will make it an extra class no problem, I just have to figure out the best way of merging the setup file. _The graph is represented in memory as c arrays. First I use the networkx to_scipy_sparse_matrix method, using the 'lil' format. Then I build 5 arrays that hold the labels, edge and weight data in a convenient format. I will add some comments to the code to make it more clear. _I had graphs with a few million nodes and about 10mln edges, and I didn't want to use your java implementation. The python version was really just too slow. _there are still a few bits I want to improve, especially with the weighted graph function. Do you have suggestions on how to store the weight data for each edge? I am now using numpy.float32, but it takes a heavy toll on memory.
Finally managed to fix all the issues in the setup file. Basically it tries to build the Cython module (chinese_whispers.cyt) from the pyx file if Cython is installed, or from the precompiled C file otherwise. If numpy is not installed, or if the C file is not available, it installs your version of the module. As you can see in the README, you can run it with chinese_whispers.cyt.chinese_whispers(G). I hope this is ok?
Just arrived. I will look at the code tomorrow.
I have checked the code. I am looking positively at accepting this pull request. Before this I need to understand the following things:
setup.py
while those who want blazingly fast computation can manually build and install your code via setup-cython.py
?cyt.c
is included in the repository. I had no prior experience with Cython, but can it be generated automatically while building? If so, it can be in .gitignore
.print()
to standard output for logging. Could you please use sys.stderr
instead?(EDIT Sorry I replied directly to the email and the formatting messed up)
Thank you Dmitry,
The reason why the c file is included is that this is the suggested way to distribute Cython software as you can see in the docs: https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules
Basically it allows users who don't have Cython installed to use the pre-compiled c file.
Perhaps a good solution to your first point is to add a YES / NO input clause during installation, where the user is asked if they want to install the cyt module or just the Python implementation. What do you think? If you'd still rather do it the way you proposed it's ok for me.
I will fix the logging to sys.sterr this evening, no problem.
Thank you for pointing this out! We rely on Travis CI for uploading packages to PyPI, so as soon as we can produce the .c
file during the build, it is fine to avoid committing this file.
My primary concern is the further maintenance of this code. I need to be generally confident in understanding the pipeline. I hope to finish polishing this PR till the end of this week.
Ok it's no problem with me, let me know if you have other questions / suggestions and we can get it done in a few days.
After several hours of research, I still have no idea how to deploy this from Travis CI to PyPI properly. Please submit only the cyt.pyx
file. I will bring the rest of the wrapping code and configuration.
Ok I am sorry to hear that. I have very little experience with Travis CI but I can have a look at the docs and see if I find a useful example. In theory, large python projects (maintained on Github and deployed to PyPI) often have optional Cython components to speed up execution, so I think we should be able to find material to copy from.
Let's see what happens after merging your code :)
Ok so should I remove the .c file? and add a setup_cython file like it was at the very beginning?
Yes, please keep only cyt.pyx
. I will arrange everything needed to make it work.
Closing this because #3 was merged. Thank you!
This is a Cython implementation of the original module. Still a beta version, it only supports the "top" algorithm. I left the original files unchanged, a part from adding a new section to the README.