Python Implementation - Githubissues

jayqi commented 4 years ago

I've begun working on a Python implementation of pkgnet. 🎉

Branch is here: https://github.com/jayqi/pkgnet/tree/py-pkg/py-pkg

This issue will track progress and discussion on design decisions.

Some code standards decisions to make:

[ ] Linter: Currently using flake8, link to config
[ ] Formatter: Currently using black, link to config
[ ] Line length: Currently using black's default of 88 characters
[ ] Minimum Python version: Currently 3.6, because I'm using f-strings.
[ ] Docstring style: Currently using Google

jayqi commented 4 years ago

What I have implemented as of now:

Working node and edge extraction for DependencyReporter
Working node and edge extraction for InheritanceReporter
Working node and edge extraction for a new reporter, ModuleReporter, since Python has the concept of modules (R does not)
- This currently gets the file tree network. Is this useful? Is imports a better network to extract?
Working implementation of DirectedGraph that uses networkx backend
- Question: do we want to stick with networkx (which is more popular) or switch to igraph (which is what R package uses)?

Some open areas that either seem hard or I don't know much about:

How to parse Python code for symbolic references for FunctionReporter
- <obj>.__code__.co_names gives strings for symbolic references, but split on dots. Looking up the object will be hard.
- Use ast library?
How to implement test packages for unit tests
How do we want to generate a report? Some potential options:
- Simple HTML template, use jinja2
- Use Jupyter notebooks and nbconvert
Which network viz library? Static image or interactive javascript? Some options I was looking at:
- pyvis uses vis.js which is what visNetwork in R is using
- plotly example doesn't let you drag nodes, may need to dig into options further
- py2cytoscape uses cytoscape.js

jameslamb commented 4 years ago

How do we want to generate a report? Some potential options: Simple HTML template, use jinja2 Use Jupyter notebooks and nbconvert

I personally prefer HTML + jinja2. jupyter and nbconvert are ubiquitous, but also kind of heavy for what we want to do.

jameslamb commented 4 years ago

I'm ok with all of the "standards to discuss" things. I personally prefer line length 100 but not something I'm going to fight hard on.

I think I could do "How to implement test packages for unit tests". I had to do that for doppel-cli (https://github.com/jameslamb/doppel-cli/tree/master/integration_tests/test-packages/python)

jameslamb commented 4 years ago

I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.

bburns632 commented 4 years ago

On networkx vs igraph, does igraph have a very similar API in python and R? We don't do all too much with graph metrics at the moment, but if we ever do, maybe a similar API would make parity easier.

On Thu, Feb 20, 2020, 10:19 PM James Lamb notifications@github.com wrote:

I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uptake/pkgnet/issues/267?email_source=notifications&email_token=AF3FDS4HO35WFRWS4OZ4A7DRD5I5DA5CNFSM4KYI5JXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMROUPA#issuecomment-589490748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3FDS7MPU3LY2VZBNGV7T3RD5I5DANCNFSM4KYI5JXA .

jayqi commented 4 years ago

I'm ok with all of the "standards to discuss" things. I personally prefer line length 100 but not something I'm going to fight hard on.

100 length sounds good to me.

I personally prefer HTML + jinja2. jupyter and nbconvert are ubiquitous, but also kind of heavy for what we want to do.

I agree. I think this is right level to make that tradeoff for what we need.

I like pyvis for consistency with the R package. Then we can live or die with vis.js, and maybe even discover things in it that we can fix or open issues for.

I'm thinking about putting the viz backend in a separate class to make it easier to replace it later. (This was also something I was thinking about doing for R.) I'll start with pyvis.

On networkx vs igraph, does igraph have a very similar API in python and R? We don't do all too much with graph metrics at the moment, but if we ever do, maybe a similar API would make parity easier.

Sort of. It looks like the Python version has these enormous graph classes with a bajillion methods. The R implementation was completely functional. In that way the APIs are pretty different. The parameters are similar though, probably because everything uses the same C++ underneath. Here is an example:

Something to consider:

https://github.com/networkx/networkx 6.9k stars
https://github.com/igraph/python-igraph 591 stars

jayqi commented 4 years ago

So the really annoying thing, and maybe deal-breaker, with pyvis is that they don't actually output code you can easily embed in an HTML doc. They have their own HTML document template where they use jinja2 to inject data into the raw javascript code to draw the graph.

https://github.com/WestHealth/pyvis/blob/master/pyvis/templates/template.html

In order to use this, we'd have to copy-paste their javascript code block into our templates.

jayqi commented 4 years ago

I'm thinking now that it may be most straightforward to just write the visualization code directly in javascript, and just inject the network data as json. This might actually be the most flexible, because we don't have to worry about existence or support of a Python interface. Which don't seen to exist in general anyways. Tt doesn't seem like there are as many good options to have Python generate the javascript code -- there isn't the same htmlwidget ecosystem that R has, as far as I can tell.

jameslamb commented 4 years ago

I'm thinking now that it may be most straightforward to just write the visualization code directly in javascript, and just inject the network data as json. This might actually be the most flexible, because we don't have to worry about existence or support of a Python interface. Which don't seen to exist in general anyways. Tt doesn't seem like there are as many good options to have Python generate the javascript code -- there isn't the same htmlwidget ecosystem that R has, as far as I can tell.

ok I like this! I think it makes a lot of sense, gives us a clearer path for "language 3", and (selfishly) is an opportunity for us to practice a little JS

jayqi commented 4 years ago

Making some progress!

screencapture-file-Users-Jay-Desktop-pkgnet-tmp-html-2020-02-23-22_21_52

jayqi commented 4 years ago

Not too much work to implement a different library. This is sigma.js. screencapture-file-Users-Jay-Desktop-pkgnet-tmp-html-2020-02-24-07_12_57

jayqi commented 4 years ago

So I've got a working implementation in place for the most part. Wondering if I should open a pull request. At some point it might be easier to start making issues for smaller pieces of stuff and tackling them individually.

jameslamb commented 4 years ago

So I've got a working implementation in place for the most part. Wondering if I should open a pull request. At some point it might be easier to start making issues for smaller pieces of stuff and tackling them individually.

@jayqi I definitely think you should open a PR! I propose that you open the PR and we only focus on resolving comments related to the API, then worry about which viz library, how to handle the weird "package name you install is different from the one you import" thing, and other issues later.

nicolasfguillaume commented 4 years ago

Hi @jayqi, thanks for your great work!

After successfully installing the package, I am trying to replicate your example but this does not work for me (Python 3.7.5 on Windows 10):

Python 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pkgnet import create_package_report
>>> create_package_report("jinja2")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Nicolas\Documents\home\workspace\pkgnet\py-pkg\pkgnet\package_report.py", line 172, in create_package_report
    pkg_reporters = default_reporters()
  File "C:\Users\Nicolas\Documents\home\workspace\pkgnet\py-pkg\pkgnet\package_report.py", line 33, in default_reporters
    DependencyReporter(),
TypeError: Can't instantiate abstract class DependencyReporter with abstract methods network_measures

Any help would be greatly appreciated.

Thank you so much,

Nicolas

jayqi commented 4 years ago

@nicolasfguillaume Thanks for trying it out! Sorry that you had an error—it looks like I had a bug sneak in there. #281 should fix it.

bburns632 commented 6 months ago

Now that CI is lower cost (in terms of time), I'd like to test what appetite there is to update the docs and publish to PyPy. I feel guilty we're let this sit for years.

LOE:

Name it!
New docs
New CI
Handful of new issues after release

How about it @jayqi @jameslamb?

jayqi commented 6 months ago

I haven't touched my implementation nearly 4 years. I don't know if it's really ready for users (I vaguely recall some stuff was still kind of janky), and I'm not sure I want to commit to developing/maintaining it.

jameslamb commented 6 months ago

+1 I can't to that, I can't commit to supporting the Python version.

In my opinion, you or @jayqi should do one of the following:

just delete that branch
publish the code to a separate public repo with permissive license then immediately archive that repo

uptake / pkgnet

Python Implementation #267