openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: GEM: A Python package for graph embedding methods #876

Closed whedon closed 6 years ago

whedon commented 6 years ago

Submitting author: @palash1992 (Palash Goyal) Repository: https://github.com/palash1992/GEM Version: v1.0.0 Editor: @danielskatz Reviewer: @jsgalan, @rougier Archive: 10.5281/zenodo.1407178

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a"><img src="http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a/status.svg)](http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jsgalan & @rougier, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @danielskatz know.

Please try and complete your review in the next two weeks

Review checklist for @jsgalan

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @rougier

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 6 years ago

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @jsgalan, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 6 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 6 years ago

--> Check article proof :page_facing_up: <--

danielskatz commented 6 years ago

👋 @jsgalan & @rougier - thanks for agreeing to review this - please do so in this issue as described in the checklist and the second comment above. Let me know if you have any questions or concerns.

danielskatz commented 6 years ago

@palash1992 - in paper.md, you reference [goyal2017graph] which doesn't compile, rather than the correct [@goyal2017graph] -- i.e. the @ is missing and should be added.

palash1992 commented 6 years ago

@whedon generate pdf

whedon commented 6 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 6 years ago

--> Check article proof :page_facing_up: <--

palash1992 commented 6 years ago

@danielskatz Thanks for pointing it out. I have modified the manuscript to reflect the change.

jsgalan commented 6 years ago

Hi all, I just finished reviewing this interesting and novel package.

After installing and revising, a few comments mostly Unrelated on the implementation

Even though is a Theano maybe should be added as a warning of the package

I just got the file and place it in the working directory. Should this dataset be placed inside the package itself under gem.datasets.karate?

Also, Ifound the following errors:

_[Errno 2] No such file or directory Exception Traceback (most recent call last) /JOSS/GEM/test1gem.py in () 42 t1 = time() 43 # Learn embedding - accepts a networkx graph or file with edge list ---> 44 Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True) 45 print (embedding._method_name+':\n\tTraining time: %f' % (time() - t1)) 46 # Evaluate on graph reconstruction

/Users/jsgalan/Library/Python/2.7/lib/python/site-packages/gem-1.0.0-py2.7.egg/gem/embedding/node2vec.pyc in learn_embedding(self, graph, edge_f, is_weighted, no_python) 81 except Exception as e: 82 print(str(e)) ---> 83 raise Exception('./node2vec not found. Please compile snap, place node2vec in the path and grant executable permission') 84 self._X = graph_util.loadEmbedding('tempGraph.emb') 85 t2 = time()

Exception: ./node2vec not found. Please compile snap, place node2vec in the path and grant executable permission_

---> 14 from gem.embedding.sdne import SDNE 15 16 # File that contains the edges. Format: source target

/Users/jsgalan/Library/Python/2.7/lib/python/site-packages/gem-1.0.0-py2.7.egg/gem/embedding/sdne.py in () 26 from keras import backend as KBack 27 ---> 28 from theano.printing import debugprint as dbprint, pprint 29 from time import time 30

/Users/jsgalan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/init.py in () 86 87 ---> 88 from theano.configdefaults import config 89 from theano.configparser import change_flags 90

/Users/jsgalan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/configdefaults.py in () 45 convert=floatX_convert,), 46 # TODO: see gh-4466 for how to remove it. ---> 47 in_c_key=True 48 ) 49

/Users/jsgalan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/configparser.pyc in AddConfigVar(name, doc, configparam, root, in_c_key) 279 if hasattr(root, name): 280 raise AttributeError('This name is already taken', --> 281 configparam.fullname) 282 configparam.doc = doc 283 configparam.in_c_key = in_c_key

AttributeError: ('This name is already taken', 'floatX')

_

@palash1992 can you give me pointers to circumvent this errors?

Best

palash1992 commented 6 years ago

@jsgalan , thanks for the review! I have revised the library to clarify the above errors. 1. I changed the specification to 'karate.edgelist' so that you can load it from the current directory. 2. Original instructions to run node2vec successfully were in gem/c_exe/readme. I have now merged them with readme on the front page. 3. I have removed the theano.printing import line making it compatible with Tensorflow as well.

Let me know if you have any more issues.

rougier commented 6 years ago

I'm trying to run the test file but without much success yet:

Using TensorFlow backend.
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
  return f(*args, **kwds)
Traceback (most recent call last):
  File "test.py", line 23, in <module>
    G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=isDirected)
  File "/Users/rougier/tmp/GEM/gem/utils/graph_util.py", line 146, in loadGraphFromEdgeListTxt
    with open(file_name, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'gem/data/TEST_50M.edgelist'

Not sure what it means. Also, I find it rather difficult to understand what the test.py is supposed to do since there's no documentation (or did I miss the link?). Could you provide some documentation for the software as well as some API description (how to call the library) ?

Also, in order to run the test file, I had to install keras and tensorflow but it is not clear if those are dependencies of the test file or the actual library. If the latter is true, you will need to update your dependencies.

palash1992 commented 6 years ago

@rougier Thanks for taking the time to review! The readme clarifies some of your questions. On the repository documentation page (readme.md), under dependencies I have specified that the package in general doesn't require keras and tensorflow/theano, but if you need to run SDNE model, you will need them. Also, the purpose of test.py is mentioned in usage section of the readme. I have also now added it to the test.py file.

The error on the dataset can be overcome by copying the gem/data/karate.edgelist to the working directory. I have modified the readme and test.py to clarify this. I also changed TEST_50M.edgelist to karate.edgelist in test.py.

Let me know if there are still any issues.

rougier commented 6 years ago

I still have some issues but I'll move to your issuer tracker. But again, it is not clear what the test is doing and what kind of output is expected. I think you should definitely add a toy example to make it easier for people to rapidly get a sense of what the software is doing.

palash1992 commented 6 years ago

@rougier : I have replied to the issue on Github. I completely agree that a toy example would make it easier to understand the purpose. I will add it soon. Thanks for the suggestion!

rougier commented 6 years ago

And also a documentation (even a small one). Sorry to insist on that but it's very difficult to know what this software is all about when you land on the first page. An explanation, a toy example (with the expected output) would really help.

palash1992 commented 6 years ago

@rougier : I will add that by the end of the day. I can add more examples as well.

jsgalan commented 6 years ago

Hi all,

I agree with comments made by @rougier regarding the documentation, even though the referenced papers are explanatory of the methods, it is useful to have some documentation of the parameters of the methods.

Otherwise, I am satisfied with the review.

Best

palash1992 commented 6 years ago

@rougier and @jsgalan Thanks for your insightful reviews! I have added explanations of model hyperparameters and more toy examples with sample execution results in the library. Please have a look and let me know if anything is missing. The library certainly benefited from your suggestions!

rougier commented 6 years ago

I still got some problems with missing files (filed an issue on the repo). I think you might want to have an example directory where you can put the test_karate.py and test_sbm.py files with the relevant data. Is there any reason why you put some data under gem/data ? Overall, I think you might need to think of a better structure for your library. Everything's here but it is now mostly a matter of organization. If you look at the criterion for the review (at the start of this thread) it might help you in deciding what's the best structure.

palash1992 commented 6 years ago

@rougier I have now created a "tests" folder with the required files following the instructions. This should resolve the issues.

danielskatz commented 6 years ago

👋 @rougier - can we get an update on your status on this?

rougier commented 6 years ago

Sorry, I'm attending a conference until Friday and I don't have too much time to re-test right now. The latest change are still not working for me (I opened several issues on the repo). It is a bit frustrating since it is always minor bugs and the core of the software seems to be ok. However, I think also there are room for improvements concerning the documentation (usage + API) and the overall structure.

Also, there is not release yet but @palash1992 can make oneonce everything is ok.

palash1992 commented 6 years ago

@rougier : Based on my replies, were you able to make it work? I have also added more examples in the readme along with sample executions. Also, I added parameter descriptions for all methods in the readme. Let me know where else can I make improvements.

rougier commented 6 years ago

It still does not work for me. But maybe it's my installation. @jsgalan Did you manage to run the two tests?

palash1992 commented 6 years ago

@rougier : while we wait for @jsgalan 's reply, can you tell me what error you are getting? I asked some of my friends to install it on different machines and they were able to do it. So I am not sure exactly what the problem could be. Perhaps it's a specific Mac issue. Any feedback would be helpful in making the package more robust to platforms.

rougier commented 6 years ago

Here is what I get from test_karate.py:

 $ python test_karate.py
Using TensorFlow backend.
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
  return f(*args, **kwds)
Num nodes: 34, num edges: 77
Could not import C++ module for Graph Factorization. Reverting to python implementation. Please recompile graphFac_ext from graphFac.cpp using bjam
        Iter id: 0, Objective: 77.0069, f1: 76.9998, f2: 0.00712866
        Iter id: 10000, Objective: 76.9998, f1: 76.9979, f2: 0.00183901
        Iter id: 20000, Objective: 76.9995, f1: 76.9978, f2: 0.00164567
        Iter id: 30000, Objective: 76.9991, f1: 76.9973, f2: 0.00175163
        Iter id: 40000, Objective: 76.9988, f1: 76.997, f2: 0.00183327
graph_factor_sgd:
    Training time: 33.110936
    MAP: 0.606258636714048   preccision curve: [1.0, 0.5, 0.3333333333333333, 0.25, 0.4]

----------------------------------------------------------------------------------------------------
Num nodes: 34, num edges: 77
SVD error (low rank): 0.053622
hope_gsvd:
    Training time: 0.020297
    MAP: 0.12345399411723433     preccision curve: [1.0, 1.0, 0.6666666666666666, 0.5, 0.4]

----------------------------------------------------------------------------------------------------
Embedding dimension greater than 2, use tSNE to reduce it to 2
Num nodes: 34, num edges: 77
Laplacian matrix recon. error (low rank): 6.293280
lap_eigmap_svd:
    Training time: 0.018129
    MAP: 0.42051116510581815     preccision curve: [0.0, 0.0, 0.0, 0.25, 0.2]

----------------------------------------------------------------------------------------------------
/usr/local/lib/python3.7/site-packages/numpy/core/numeric.py:544: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order, subok=True)
/usr/local/lib/python3.7/site-packages/numpy/core/numeric.py:492: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order)
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:844: ComplexWarning: Casting complex values to real discards the imaginary part
  x = float(self.convert_xunits(self._x))
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:845: ComplexWarning: Casting complex values to real discards the imaginary part
  y = float(self.convert_yunits(self._y))
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:711: ComplexWarning: Casting complex values to real discards the imaginary part
  posx = float(textobj.convert_xunits(textobj._x))
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:712: ComplexWarning: Casting complex values to real discards the imaginary part
  posy = float(textobj.convert_yunits(textobj._y))
Num nodes: 34, num edges: 77
/usr/local/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py:1849: RuntimeWarning: invalid value encountered in sqrt
  s = np.sqrt(eigvals)
lle_svd:
    Training time: 0.023433
    MAP: 0.4528783377992382      preccision curve: [0.0, 0.0, 0.0, 0.25, 0.2]

----------------------------------------------------------------------------------------------------
Num nodes: 34, num edges: 77
[Errno 2] No such file or directory: 'node2vec': 'node2vec'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gem-1.0.0-py3.7.egg/gem/embedding/node2vec.py", line 80, in learn_embedding
    call(args)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 304, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 756, in __init__
    restore_signals, start_new_session)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 1499, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'node2vec': 'node2vec'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_karate.py", line 47, in <module>
    Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
  File "/usr/local/lib/python3.7/site-packages/gem-1.0.0-py3.7.egg/gem/embedding/node2vec.py", line 83, in learn_embedding
    raise Exception('./node2vec not found. Please compile snap, place node2vec in the system path and grant executable permission')
Exception: ./node2vec not found. Please compile snap, place node2vec in the system path and grant executable permission
palash1992 commented 6 years ago

@rougier: So the program runs other models and is unable to locate node2vec. As per the installation instructions on the repository page, to install node2vec as part of the package, recompile from https://github.com/snap-stanford/snap and add node2vec executable to system path. To grant executable permission, run: chmod +x node2vec. It should solve the issue. To test the above, alternatively you can try running node2vec from a terminal.

rougier commented 6 years ago

Ok. You'll need to add SNAP in the list of dependencies (and it might be good to add links for each of the dependency).

By the way, can't we use https://github.com/aditya-grover/node2vec as a fallback?

palash1992 commented 6 years ago

@rougier: My idea for not adding node2vec to dependencies was based on the premise that some users may not need to run node2vec and thus should not be restricted by having library dependency. Adding the fallback is a good alternative. At this point I can make one of the below three modifications:

  1. Add an argument in test.py specifying the method to be tested. Thus, the program would be run as python test.py -m all (or) python test.py -m hope,lap,gf. If the user is interested in node2vec, as specified in the installation instructions he/she can specify it in the argument.
  2. Add SNAP as the dependency.
  3. Add https://github.com/aditya-grover/node2vec as the fallback and dependency of gensim in the library. If the user installs SNAP, the library will automatically take the installation.

Please let me know which of these are you in favor of.

rougier commented 6 years ago

This is where I get lost and I think the culprit is the lack of documentation ;) Is node2vec necessary only to run the tests or is it necessary to run the software? If this is only for the test, then it's fine for me. I can run the karate test and probably @jsgalan can run the other one. However, it would be good in the documentation to explain (for example) the karate example (what is the expected output, what does it mean, etc) and to explain that second example requires the SNAP library. In the end, it would be only a matter of changing the README.

Also, maybe the tests directory could be renamed examples since content is actually illustrating usage.

While checking the paper, I think you're missing a DOI for Belkin et al. 2002.

palash1992 commented 6 years ago

@whedon generate pdf

whedon commented 6 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 6 years ago

--> Check article proof :page_facing_up: <--

palash1992 commented 6 years ago

@rougier: Yes, node2vec is only required for tests. I have now made the changes you specified and also added an additional argument named node2vec to use if you need to run node2vec. I have clarified this in readme.

Hope it is fine now.

rougier commented 6 years ago

Thanks. The README looks better. I'm still unable to run the test_sbm.py due to the _node problem. @jsgalan Did you manage to run it?

Apart from that, I think it is ready for publication (@danielskatz) even though you could improve a little bit the README (for example, by re-using the summary of the paper.md as an introduction for your README).

Don't forget to make a release on GitHub, I think it is mandatory for publication.

palash1992 commented 6 years ago

@rougier: I have now added a release and modified readme.

To solve the _node issue, did you try uninstalling networkx and installing using pip install networkx==1.11?

rougier commented 6 years ago

No, I did not try actually.

jsgalan commented 6 years ago

Hi i've cloned the git a few times now. I cloned and compiled SNAP and added node2vec to my system's path.

This is what i got after trying the tests:

attached both runs, @palash1992 could you look at the warnings?.

Best

jsgalan commented 6 years ago

@rougier after the pip install networkx==1.11 some parts of the tests where smooth. you might want to try.

palash1992 commented 6 years ago

@rougier : Let me know if the issue still persists

rougier commented 6 years ago

@palash1992 Just did and it works (with some warnings though)

palash1992 commented 6 years ago

@rougier : Perfect! I also made a release. Let me know if there is anything else needed from my side.

danielskatz commented 6 years ago

@rougier - I see there are still quite a few checkboxes left empty. Please go through them when you have a chance.

rougier commented 6 years ago

@danielskatz Just did. The Functionality documentation and Community guidelines could be improved a bit but it does not prevent publication. Thanks @palash1992 for taking all my comments into account.

danielskatz commented 6 years ago

great, thanks @rougier !

@palash1992 - please make an archive of the current state of the repo in Zenodo or similar, and let me know the DOI of the archive, so that can move forward in publishing this submission.

palash1992 commented 6 years ago

@danielskatz : I used Zenodo to create the archive. Here is the DOI: 10.5281/zenodo.1407178

danielskatz commented 6 years ago

@whedon set 10.5281/zenodo.1407178 as archive

whedon commented 6 years ago

OK. 10.5281/zenodo.1407178 is the archive.

danielskatz commented 6 years ago

@whedon generate pdf