rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.96k stars 12.53k forks source link

Need metrics on downloads and pageviews #25978

Closed edunham closed 8 years ago

edunham commented 9 years ago

We currently can't count how many people visit the site, or how many downloads each release gets. These numbers would be very good to have, for publicity as well as identifying areas for improvement.

huonw commented 9 years ago

I believe google analytics is running on rust-lang.org (and blog.r-l.o)?

alexcrichton commented 9 years ago

We've got analytics set up for rust-lang.org, blog.rust-lang.org, and crates.io, but that doesn't track the download information I believe because that's all routed through S3. We could count clicks on the main page to the artifacts, but I suspect those are far lower than direct downloads (e.g. through multirust).

edunham commented 9 years ago

@alexcrichton cool! I'd like to at least get access to those stats myself, and hopefully automate a way of more publicly sharing the less-creepy metrics like hits per day (and possibly breakdown of hits by platform).

I just did some quick research on options for S3 download counts, and it looks like there's no good solution built into the service. The answers on the AWS forum and quora agree that we basically have 4 options:

As @alexcrichton mentioned, counting clicks would probably not provide us with much useful data on its own, since we'd like to know about downloads performed by scripts as well. Later on, it might be interesting to see how many downloads are script-enabled browsers vs hard-coded links in bots by gathering that data, but it seems relatively unimportant for now.

There exists a BSD-licensed PHP script, last updated about 8 months ago, that sits on Heroku and proxies requests to S3 while logging them to Google Analytics. There's also a MIT-licensed python tool, last activity 2 years ago, which parses S3 logs into a CSV of download count.

A couple of services would trade us useful metrics on our AWS logs for money, namely s3stat, qcloudstat, and some CloudBerry Windows-only "freeware" thing.

s3stat is typically a flat $10/month, or also offers a snarky but free "cheap bastard plan" wherein they trade free service for some publicity. Qcloudstat's plans are either 5 or 20 euros per month -- I woudn't be surprised if we're near the upper limit of the smallest plan.

The final consideration in all this is that we'll have to pay for any logs we store on S3, and others have reported that the logs tend to grow surprisingly quickly. For now, I'm going to enable logging for a week so we have some real data for projecting how fast the logs will grow, roughly how many downloads we should expect a tool to handle, and so on.

Long-term, the fix with fewest moving parts would probably be a script that parses relevant log entries, throws their data into Google Analytics as custom events, then deletes the old logs. A script like that would probaby take less than a day to build an MVP, intermittent half-hours over a couple weeks to tweak and troubleshoot in production, then it'd more or less run by itself until breaking changes get made to any of the APIs with which it interfaces.

@aturon, @brson thoughts on where and how soon it's important to start publishing download stats?

brson commented 9 years ago

@edunham We don't necessarily need to publish them, but having them available when people ask would be nice. It's hard to say that this is very high priority though compared to other things on the list.

Consider that we currently need to account for both downloads from cloudfront and s3 because update.sh goes straight to s3 to avoid the sync bugs (if gpg is available).

aturon commented 9 years ago

I agree with @brson that this isn't a huge priority, but I am pretty interested in getting more metrics across the board, so we can get a better sense for how the community and ecosystem are progressing. If there are relatively easy metrics to collect, even in an unpolished way, it seems worthwhile to pick a few of them off.

edunham commented 8 years ago

Current status: Cloudfront and S3 logs exist; Alex can hand out analytics access for pageviews.