Open sholladay opened 7 years ago
Just leaving another comment here, the dependedUpon
view is giving the same number for each package (1703502 currently)
We use:
got(`https://replicate.npmjs.com/registry/_design/app/_view/dependedUpon`, {
json: true,
query: {
startKey: JSON.stringify([name]),
endKey: JSON.stringify([name, {}]),
stale: 'update_after',
},
})
@Haroenv from what I remember, group_level
, skip
, and limit
are important to get the data correctly. See my links in the original post. I got weird results like yours when I didn't include exactly the right properties.
Issue I was having is that startkey
and endkey
are all lowercase now, while they used to be camel cased. However the number is still not in sync with the website
Here are two efforts to collect this data into an easily consumable offline package format:
https://github.com/nice-registry/dependent-counts https://github.com/nice-registry/dependent-packages
They're out of date, but if the need is there I can dust them off and apply some automation to keep them fresh.
that might be an option too, I wonder if it’s faster to download all dependents over the whole registry, or doing an api call for each package (my guess is towards 1). I’ll try out what the time (and install time) differences are @zeke. What process do you have for keeping it updated?
What process do you have for keeping it updated?
@Haroenv I use a Heroku Scheduler process (think cron
) that runs every day (or hour depending on the project).. The process has GitHub and npm credentials. It git clone
s the repo, runs a build script, runs tests, and if everything passes it checks in the changes and publishes to npm.
The process is outlined here: http://zeke.sikelianos.com/npm-and-github-automation-with-heroku/
As I recall, collecting all the package dependents was quite a time consuming process, and there are undoubtedly better ways to do the actual collection. But having a reasonably up-to-date dataset that people can simply npm install
is really nice as a consumer of the data.
While waiting npm registry support this officially, you may use Badgen's api: https://api.badgen.net/npm/dependents/chalk
Or if you are looking for a badge, that's what Badgen do 🤗
WARNING
As @Haroenv 👇point out (thanks!), currently there's a scraper behind this API (why), it might break occasionally if npmjs.com update it's html structure. So don't rely seriously upon it.
Currently since we don't have a fast api for dependents count, if we have, there won't be this issue. Someone will have to do some dirty work for it. If you gonna do it in the same way as I did, this api could be a handy option.
I am the author of squatter, a library that aims to determine package quality and help package authors know when a name on npm is really being used.
One metric I use in
squatter
to determine if a package is useful is to count the number of dependents a package has. See the implementation here: https://github.com/sholladay/squatter/blob/d1352745d28c5ba76d965cbff9ebe2769f4388b6/lib/has-binary-or-dependent.js#L16-L27I currently use this undocumented CouchDB view:
I found the
dependedUpon
view here: https://github.com/chrisdickinson/npm-get-dependents/blob/3e5a82e6039ddb3a638fa0301f356b39bab898d7/index.js#L40-L47My understanding is that the registry team officially supports these CouchDB views but would prefer for people to move away from using them directly in this manner. So this is a feature request to provide an API replacement.
Specifically, an API that, given a package name, will return a list of its dependents (packages that depend upon it). Or at least the number of dependents, if not their names. The names would be useful so that I can filter out packages I consider bogus, but a simple count is better than nothing.