API to list package dependents

sholladay commented 7 years ago

I am the author of squatter, a library that aims to determine package quality and help package authors know when a name on npm is really being used.

One metric I use in squatter to determine if a package is useful is to count the number of dependents a package has. See the implementation here: https://github.com/sholladay/squatter/blob/d1352745d28c5ba76d965cbff9ebe2769f4388b6/lib/has-binary-or-dependent.js#L16-L27

I currently use this undocumented CouchDB view:

https://registry.npmjs.org/-/_view/dependedUpon

I found the dependedUpon view here: https://github.com/chrisdickinson/npm-get-dependents/blob/3e5a82e6039ddb3a638fa0301f356b39bab898d7/index.js#L40-L47

My understanding is that the registry team officially supports these CouchDB views but would prefer for people to move away from using them directly in this manner. So this is a feature request to provide an API replacement.

Specifically, an API that, given a package name, will return a list of its dependents (packages that depend upon it). Or at least the number of dependents, if not their names. The names would be useful so that I can filter out packages I consider bogus, but a simple count is better than nothing.

Haroenv commented 6 years ago

Just leaving another comment here, the dependedUpon view is giving the same number for each package (1703502 currently)

We use:

      got(`https://replicate.npmjs.com/registry/_design/app/_view/dependedUpon`, {
        json: true,
        query: {
          startKey: JSON.stringify([name]),
          endKey: JSON.stringify([name, {}]),
          stale: 'update_after',
        },
      })

sholladay commented 6 years ago

@Haroenv from what I remember, group_level, skip, and limit are important to get the data correctly. See my links in the original post. I got weird results like yours when I didn't include exactly the right properties.

Haroenv commented 6 years ago

Issue I was having is that startkey and endkey are all lowercase now, while they used to be camel cased. However the number is still not in sync with the website

zeke commented 6 years ago

Here are two efforts to collect this data into an easily consumable offline package format:

https://github.com/nice-registry/dependent-counts https://github.com/nice-registry/dependent-packages

They're out of date, but if the need is there I can dust them off and apply some automation to keep them fresh.

Haroenv commented 6 years ago

that might be an option too, I wonder if it’s faster to download all dependents over the whole registry, or doing an api call for each package (my guess is towards 1). I’ll try out what the time (and install time) differences are @zeke. What process do you have for keeping it updated?

zeke commented 6 years ago

What process do you have for keeping it updated?

@Haroenv I use a Heroku Scheduler process (think cron) that runs every day (or hour depending on the project).. The process has GitHub and npm credentials. It git clones the repo, runs a build script, runs tests, and if everything passes it checks in the changes and publishes to npm.

The process is outlined here: http://zeke.sikelianos.com/npm-and-github-automation-with-heroku/

As I recall, collecting all the package dependents was quite a time consuming process, and there are undoubtedly better ways to do the actual collection. But having a reasonably up-to-date dataset that people can simply npm install is really nice as a consumer of the data.

amio commented 6 years ago

While waiting npm registry support this officially, you may use Badgen's api: https://api.badgen.net/npm/dependents/chalk

Or if you are looking for a badge, that's what Badgen do 🤗

WARNING

As @Haroenv 👇point out (thanks!), currently there's a scraper behind this API (why), it might break occasionally if npmjs.com update it's html structure. So don't rely seriously upon it.

Currently since we don't have a fast api for dependents count, if we have, there won't be this issue. Someone will have to do some dirty work for it. If you gonna do it in the same way as I did, this api could be a handy option.

Haroenv commented 6 years ago

For future reference, seems like what badgen is doing is requesting the page on npmjs.com and then running cheerio on it (code). This is useful, but I'd love to have a real solution with an API still

npm / registry-issue-archive

API to list package dependents #231