pivotal / LicenseFinder

Find licenses for your project's dependencies.
MIT License
1.72k stars 338 forks source link

Support for private pypi #230

Open wayne-luminal opened 8 years ago

wayne-luminal commented 8 years ago

Looking at pip.rb, pypi is contacted directly for package license information. If you host your libraries on a private pypi (or anything other than pypi.python.org), your private projects will never be found. Is there an option to provide a pypi-like url?

Alternatively, what do you think about using pip show for grabbing license information instead of pypi? The good thing about pypi is it's json and predictable but the output of pip show is not :/

flavorjones commented 8 years ago

I'm open to making a change like this, particularly if we can detect resolved package dependencies without making external network connections.

There's a broader topic here that spans more than just pypi -- currently bundler and some other package manager classes also go to the network. I'd love if we had the option of examining files on disk whenever possible (see how the GoDep class directly parses the Godeps.json file for an example of what I mean).

Would you be willing to look into how hard it would be to accomplish this in the pip world?

wayne-luminal commented 8 years ago

@flavorjones I forked the project and made an attempt to solve this. You can see the diff at: https://github.com/pivotal/LicenseFinder/compare/master...wayne-luminal:pip-instead-of-pypi

I followed the pattern of shelling out to a python script because I don't know much about Ruby but more importantly, I can use the pip classes (I don't know if this is possible in Ruby). The script uses regex to find all the "key: value" pairs and create a json document out of it. For example, License: Apache License 2.0 becomes "license": "Apache License 2.0" in the document. We already know using the pip classes are a little fragile (#224), so I'm not crazy about this approach, but it gets the tool away from reaching out to the network. It might be possible to use pip show <package-name> instead.

One note, if a value (in the key/value output of package metadata) contains a multiline string, only the first line will be retrieved. I've seen some projects where the entire contents of a license file is dumped into place rather than the title of the license itself. For an example, see v0.9.5 of github3.py. This will be fixed in the 1.0 release AFAIK but it is something to consider.

There are still tests to add which I hope to do soon. I wanted to get your initial feedback on this so far.

flavorjones commented 8 years ago

I'll try to take a look today. Thanks for the effort!

bikebilly commented 5 years ago

Hi @flavorjones, are you still interested to implement the proposed fix by @wayne-luminal? This seems a very interesting feature to support air-gapped environments with limited internet access.

Thanks!