Open yamadapc opened 8 years ago
Yes I'd very much like to expose download stats in a convenient form. We collect quite a bit of detail in the download stats feature but not a lot is exposed yet.
@yamadapc you're most welcome to have a go at this. Ask in #hackage if you want advice.
Updating this.
Since there seems to be a fastly CDN in front of Hackage now, exposing the stats through the API isn't worth the trouble (since they'll be incorrect).
Instead, it gets closer to what NPM seems to do by generating the download stats data from the CDN's logs.
It seems there're ways to enable log aggregation from fastly as outlined in:
Another system, like NPM's download-counts
would then run every scheduled time, parse the logs and generate statistics for the download counts. Depending on how the logs are structured, I'd guess, there could even be some re-use of NPM's existing tooling.
I would be quite interested in this. It'd would be great to compare packages with this data. I've even got a web frontend up for a similar project (open source): Example: https://trycatchchris.co.uk/archpackagecompare/comparePackage/gnome-terminal/lxterminal/rxvt/rxvt-unicode/st/terminator/termite/xterm Code: https://github.com/chrissound/ArchPackageCompareStats
I'm not too familiar with all these logging services though - are these all paid services?
fwiw, the plan is to inject the CDN download count data into Hackage so it can provide more reliably counts again; I just need to finish the S3-logic to reliably fetch and aggregate the daily data.
We have a plan to tackle the CDN issue so that shouldn't be a blocker on this.
:+1: /packages/top
orders packages by number of downloads in the past 30 days, but it'd be nice even if it supported a query parameter to toggle all-time downloads.
I'm writing a script to run on the X most popular packages (by number of downloads), and /packages/top/
provides a decent proxy, but it'd be better to work off the all-time stats.
(also, /packages/top
doesn't have a JSON option, so I'm having to scrape HTML right now. Can this endpoint also allow returning JSON format?)
Building-up from "Proposed Statistics features" I'd like to have an issue about exposing an API for download stats over time.
Depending on how that task went, someone (or myself) could take more work.
I'd like to be assigned to expose that, along with #332. Not all the statistics proposed, but an API for downloads over time, per package, per version.
To be honest, I don't understand why
/packages/downloads
requires admin access.I briefly discussed this on
#hackage
but not as deeply as I'd like. So if this comes as nonsensical, feel free to close and ignore. In my mind, I'd like to have JSON resources for:/packages/:package_name
(with adownloads
anddownloads_this_month
count)/packages/:package_name/downloads
(with a "query-able" count over time)I'm not sure if I follow what goes with the
/packages/top
resource. What is the criteria for a package to be considered a "top" package? Nº of downloads, I guess, but I mean how many?I've started scrapping data from that resource on hackage-downloads. The next step would be to add a web-service for serving counts over time and just hitting the resource every day or so.
But it'd be nice to have this on the Hackage API. I think NPM has an interesting implementation of this; it's very ad-hoc, like this repository linked above.