Open mboehme opened 3 years ago
You can find the direct dependencies for a project via the Github GraphQL DependencyGraphManifestConnection.
Ed Summers wrote a small utility, called xkcd2347, that walks the dependencies of a projects up to a given depth.
pip install xkcd2347
xkcd2347 --depth 2 kubernetes/kubernetes # It will ask you Github token.
import xkcd2347
gh = xkcd2347.GitHub(key="yourkeyhere")
for dep in gh.get_dependencies('kubernetes', 'kubernetes'):
print(dep['packageName'])
This is what Github parses to construct the Dependency Graph:
Can Github folks help to analyze the Dependency Graph and get the number of projects that directly or indirectly depend on a given project?
Counting the number of packages that directly or indirectly depend on curl
.
$ apt-cache rdepends --no-recommends --no-suggests --no-enhances --recurse curl | grep -v "Reverse Depends:" | wc -l
329966
Counting the number of packages upon which curl
directly or indirectly depends.
$ apt-cache depends --no-recommends --no-suggests --no-enhances --recurse curl | wc -l
54366
Counting the number of packages upon which react depends
$ npm install -g npm-remote-ls
$ echo $(( $(npm-remote-ls react | wc -l) - 1))
4
@mboehme this is awesome! I really like the graph idea. I think "centrality" could be an awesome signal into the final criticality score (maybe even the dominant one), but I don't see how we could use it to compare across package-manager ecosystems. I'm not aware of any way today to get the full, global graph of dependencies, which is what we would really need here.
For example, the PyPI graph could be amazing for comparing within PyPI, but it would never show you that the CPython itself is a dependency of (almost) every package on PyPI.
Also thanks for the pointer to the GraphQL API! I missed this when I was playing around at first, because it's not available for Go yet which is where I started looking.
@andrew’s https://libraries.io tracks many package manager ecosystems and has APIs for many things, including dependents (https://libraries.io/api).
It is extensible; you can add support for new package managers: https://github.com/librariesio/libraries.io/blob/master/docs/add-a-package-manager.md
Still on my list to extend it to see spack packages.
I think "centrality" could be an awesome signal into the final criticality score (maybe even the dominant one), but I don't see how we could use it to compare across package-manager ecosystems. [..]
For example, the PyPI graph could be amazing for comparing within PyPI, but it would never show you that the CPython itself is a dependency of (almost) every package on PyPI.
I agree. There are certain dependencies that cannot be tracked. For instance, dependence on the kernel or the compiler / interpreter won't be that explicit. The importance of those projects is more visible in the other signals of the criticality score.
I agree. There are certain dependencies that cannot be tracked. For instance, dependence on the kernel or the compiler / interpreter won't be that explicit. The importance of those projects is more visible in the other signals of the criticality score.
If that theoretical graph did exist somehow, all of this would be much simpler!
@andrew’s https://libraries.io tracks many package manager ecosystems and has APIs for many things, including dependents (https://libraries.io/api).
So many cool things to look at! I had no idea libraries.io had an API. Adding this to the list of things to play around with.
Adding another pointer from Georgios at Facebook.
I started a quick doc here with notes of playing around with libraries.io: https://docs.google.com/document/d/1Du2rDDd_nueH6BVZmVrrVSSGECnhjde_F3inNT9QzL8/edit#heading=h.yg897byn3xrw
Feel free to add others and join in the fun!
I was going to point to Libraries.io, glad you've already come across it 👍
Just a note, I've played with the Libraries.io data a bit and noticed some staleness issues in some cases. Also found some circular dependencies, for example: https://libraries.io/pypi/aniso8601/dependents https://libraries.io/pypi/relativetimebuilder/dependents (I looked into this one, and found an older version of aniso8601 used to depend on relativetimebuilder, perhaps it's a staleness issue?)
Just a heads up!
Thanks @jli i think from libraries.io sourcerank, we just need to take out the dependent_projects and dependent_repositories calculation https://github.com/librariesio/libraries.io/blob/ad830db5f08c11a82c569c847c04451c57f0a624/app/models/concerns/source_rank.rb#L34
For example, the PyPI graph could be amazing for comparing within PyPI, but it would never show you that the CPython itself is a dependency of (almost) every package on PyPI.
This illustrates that the dependency graph isn't just binary "edge or no edge" - for many Python packages you need CPython or PyPy (or Jython, or...). How do we model a dependency on one-of-N packages?
Many packages also have optional dependencies: for example my own Hypothesis project has minimal mandatory dependencies, but a variety of optional extensions for numeric code, Django, automated refactoring of downstream code, etc. Do those relationships show up?
Runtime vs dev-time dependencies have a similar character, but the latter might be security-critical - your might not worry about a linter, but a compromised compiler could cause a lot of trouble.
To understand how critical a project P is, it would be worthwile to track which projects directly or indirectly depend on P. The larger this set of dependent projects the more critical it is.
This issue is looking at the first step, to track ways to programatically establish the direct dependencies of a project. Lets find the outdegree first.