this PR adds a Repository#projects_dependencies method and adds it to endpoints/codepaths where we want to fetch a repo's deps.
Flowchart: how fetching deps for a repository is changed by this PR
flowchart TB
subgraph "AFTER"
direction TB
Repository1["Repository"] --> Project
Project --> Version
Version --> Dependency1["Dependency 1\n e.g. 'rack 1.0.0'"]
Version --> Dependency2["Dependency 2\n e.g. 'dalli ~> 2.0.0'"]
Version --> Dependency3["Dependency 3\n e.g. 'bibliothecary ~ 1.7'"]
end
subgraph "BEFORE"
direction TB
Repository2["Repository"] --> RepositoryDependency4["RepositoryDependency 1\n e.g. 'rack 1.0.0'"]
Repository2["Repository"] --> RepositoryDependency5["RepositoryDependency 2\n e.g. 'dalli ~> 2.0.0'"]
Repository2["Repository"] --> RepositoryDependency6["RepositoryDependency 3\n e.g. 'bibliothecary ~ 1.7'"]
end
RepositoryDependency vs Dependency: where is the data from?
RepositoryDependency is populated by scanning all the manifest files for deps in the repository, e.g. https://github.com/librariesio/libraries.io.
Dependency is populated by scanning the actual package's dependencies from its package repository API.
Why?
Scanning all the manifests in a repository for RepositoryDependency is inaccurate for many many repos
e.g. Libraries currently sees an NPM project and records the deps records for its package-lock.json manifest and every dep for all the manifests nested inside node_modules/.
additionally, Libraries is also pulling deps from test folders, template folders, build output folders, educational folders, etc.
Removed repositories take up space
of the top 100 repositories with the most manifests, 12 of them are Removed.
Package deps > repository deps
deps for specific packages are more useful than 100% of the deps found in its repository
* Alternatives exist
GitHub (which accounts for 98% of the repos on Libraries) now has an Insight > Dependency Graph page that can list the deps found in the entire repo, if people still need that data.
Which Libraries endpoints will this affect?
Web: showing a repo's dependencies will now pull the deps from the repo's projects
Web: showing a project's usage will now pull from project deps instead of repo deps
Api: pulling repository dependencies will now pull from project deps instead of repo deps
Api: the docs page will now us project deps in the example for repositorie's dependencies
Breaking changes?
Other than the source of the dependency data changing, we will start returning filepath: nil for all repository dependencies from the API, since we don't have a relative filepath to the repo in most cases.
Next step
after this is deployed, and if there are no issues with it, we can followup with a PR to stop ingesting RepositoryDependencies and also get rid of the table.
this PR adds a
Repository#projects_dependencies
method and adds it to endpoints/codepaths where we want to fetch a repo's deps.Flowchart: how fetching deps for a repository is changed by this PR
RepositoryDependency vs Dependency: where is the data from?
RepositoryDependency is populated by scanning all the manifest files for deps in the repository, e.g.
https://github.com/librariesio/libraries.io
.Dependency is populated by scanning the actual package's dependencies from its package repository API.
Why?
Scanning all the manifests in a repository for RepositoryDependency is inaccurate for many many repos
e.g. Libraries currently sees an NPM project and records the deps records for its
package-lock.json
manifest and every dep for all the manifests nested insidenode_modules/
.additionally, Libraries is also pulling deps from test folders, template folders, build output folders, educational folders, etc.
Removed repositories take up space
of the top 100 repositories with the most manifests, 12 of them are Removed.
Package deps > repository deps
deps for specific packages are more useful than 100% of the deps found in its repository
* Alternatives exist
GitHub (which accounts for 98% of the repos on Libraries) now has an
Insight > Dependency Graph
page that can list the deps found in the entire repo, if people still need that data.Which Libraries endpoints will this affect?
Breaking changes?
Other than the source of the dependency data changing, we will start returning
filepath: nil
for all repository dependencies from the API, since we don't have a relative filepath to the repo in most cases.Next step
after this is deployed, and if there are no issues with it, we can followup with a PR to stop ingesting RepositoryDependencies and also get rid of the table.