haskell / hackage-server

Hackage-Server: A Haskell Package Repository
http://hackage.haskell.org
Other
416 stars 198 forks source link

Downloads and statistics API #458

Open yamadapc opened 8 years ago

yamadapc commented 8 years ago

Building-up from "Proposed Statistics features" I'd like to have an issue about exposing an API for download stats over time.

Depending on how that task went, someone (or myself) could take more work.

I'd like to be assigned to expose that, along with #332. Not all the statistics proposed, but an API for downloads over time, per package, per version.

To be honest, I don't understand why /packages/downloads requires admin access.

I briefly discussed this on #hackage but not as deeply as I'd like. So if this comes as nonsensical, feel free to close and ignore. In my mind, I'd like to have JSON resources for:

I'm not sure if I follow what goes with the /packages/top resource. What is the criteria for a package to be considered a "top" package? Nº of downloads, I guess, but I mean how many?

I've started scrapping data from that resource on hackage-downloads. The next step would be to add a web-service for serving counts over time and just hitting the resource every day or so.

But it'd be nice to have this on the Hackage API. I think NPM has an interesting implementation of this; it's very ad-hoc, like this repository linked above.

dcoutts commented 8 years ago

Yes I'd very much like to expose download stats in a convenient form. We collect quite a bit of detail in the download stats feature but not a lot is exposed yet.

@yamadapc you're most welcome to have a go at this. Ask in #hackage if you want advice.

yamadapc commented 8 years ago

Updating this.

Since there seems to be a fastly CDN in front of Hackage now, exposing the stats through the API isn't worth the trouble (since they'll be incorrect).

Instead, it gets closer to what NPM seems to do by generating the download stats data from the CDN's logs.

It seems there're ways to enable log aggregation from fastly as outlined in:

Another system, like NPM's download-counts would then run every scheduled time, parse the logs and generate statistics for the download counts. Depending on how the logs are structured, I'd guess, there could even be some re-use of NPM's existing tooling.

chrissound commented 7 years ago

I would be quite interested in this. It'd would be great to compare packages with this data. I've even got a web frontend up for a similar project (open source): Example: https://trycatchchris.co.uk/archpackagecompare/comparePackage/gnome-terminal/lxterminal/rxvt/rxvt-unicode/st/terminator/termite/xterm Code: https://github.com/chrissound/ArchPackageCompareStats

I'm not too familiar with all these logging services though - are these all paid services?

hvr commented 7 years ago

fwiw, the plan is to inject the CDN download count data into Hackage so it can provide more reliably counts again; I just need to finish the S3-logic to reliably fetch and aggregate the daily data.

gbaz commented 6 years ago

We have a plan to tackle the CDN issue so that shouldn't be a blocker on this.

brandonchinn178 commented 2 years ago

:+1: /packages/top orders packages by number of downloads in the past 30 days, but it'd be nice even if it supported a query parameter to toggle all-time downloads.

I'm writing a script to run on the X most popular packages (by number of downloads), and /packages/top/ provides a decent proxy, but it'd be better to work off the all-time stats.

(also, /packages/top doesn't have a JSON option, so I'm having to scrape HTML right now. Can this endpoint also allow returning JSON format?)