niemeyer / gopkg

Source code for the gopkg.in service.
Other
537 stars 85 forks source link

git clone stats #34

Closed mcuadros closed 11 months ago

mcuadros commented 9 years ago

Since gopkg is making a MitM between the real repository and the clients, we can track the number of clones and some other cool metrics. I will love know a bit more about how often are my packages used.

Caveats: The bad part of this is that for keep the stats the service will require storage.

I can work in this feature if fits on the project scope.

GeertJohan commented 9 years ago

I like this idea :+1:

One of the first things I thought about was stats as (de-) motivation. When someone publishes their fancy new project and it gets only 10 clones in the first week, that can be extremely de-motivating and lead to that person quiting the project. Even though the project could become the next gorilla/mux or codegangsta/cli. Maybe take the same approach Google Play Store does and (at first?) only display: "1-100", "100-200", "200-500", etc.

If the stats are only a simple counter without a timeline, the storage can probably kept at just a few MB's. In that case it could be as simple as a leveldb with in-memory counters and a lifetime timeout to store+remove unused counters from memory. Another interesting part will be probably be sharing the counter stats with the fail-over server.

When keeping/displaying historical stats, I can recommend InfluxDB. It provides cool options to automatically keep stats for, lets say, 24h@5m, 7d@1h, 1month@1d, ever@1week. The InfluxDB should be running on an external server and influx downtime should not lead to interrupted availability or downgraded performance at gopkg.in, just drop the stats. Maybe we could ask the InfluxDB to sponsor a hosted node. Rendering of graphs can be done client-side.

Lets hear from @niemeyer what he thinks. If he thinks this can be added and if you don't mind @mcuadros, then I would like to help with this.

mcuadros commented 9 years ago

Actually is pretty similar to the Github stats: screen shot 2015-04-01 at 13 17 31

About leveldb, personally I prefer boltdb because is pure Go, since InfluxDB needs to be a external service is getting more complex to deploy.

niemeyer commented 9 years ago

The Github stats should still work with gopkg.in:

https://github.com/go-mgo/mgo/graphs/traffic

mcuadros commented 9 years ago

This URL is private.

GeertJohan commented 9 years ago

Yes github stats only works for projects you own, afaik.

GeertJohan commented 9 years ago

@mcuadros InfluxDB doesn't need to be an external service, it would just be preferable to locate it on a different machine so it can't cause any service degradation for gopkg.in. goleveldb is also pure go.

niemeyer commented 9 years ago

The initial problem statement was "I will love know a bit more about how often are my packages used" which as I understand it is covered by the github stats?

GeertJohan commented 9 years ago

@niemeyer Right. but what about public stats?

mcuadros commented 9 years ago

@niemeyer maybe we can change it to: I will love know a bit more about how often are my packages used and let the people see it.

We can build a badge with the downloads or something similar. But maybe this is out of the scope of this project.

niemeyer commented 9 years ago

I'm not sure it's worth it.. the person most interested in usage stats is the author, which is how we got here, and that's covered. Offering public usage stats then gets us into the trouble of "what if I don't want usage stats for my package to be known"? Doesn't seem worthwhile, but I can be proven wrong of course.

mcuadros commented 9 years ago

All the mayor index repositories includes a stats module, in many of this cased you can put on your markdown a badge with the stats. So looks like something interesting. Why someone wants to hide this stats?

npmjs: screen shot 2015-04-01 at 16 12 52 pipy: screen shot 2015-04-01 at 16 13 15 rubygems: screen shot 2015-04-01 at 16 13 35

niemeyer commented 9 years ago

Okay, I'll reopen so we can think further about this.

GeertJohan commented 9 years ago

I can see the argument behind "what if I don't want usage stats for my package to be known".

What about adding support for a Gopkg.yaml that we fetch and contains an opt-in field to enable stats. I thought about a Gopkg config file earlier but never opened an issue for it: allowing people to add a custom docs URI. There are probably more use-cases for a Gopkg.yaml file. There could be a different config for each major version. (e.g.: display stats for v1, but not for v2), gopkg.in would grab the Gopkg.yaml from the latest release for every major version.

niemeyer commented 9 years ago

I personally think it's not worth the trouble. It's a lot of extra logic where none exists today, just to satisfy a use case which has its most relevant aspect already covered.

mcuadros commented 9 years ago

Can be a flag for disable if you want (in case that anyways want to disable),

BTW the idea of gopkg.yaml can worth it in the case of the version numbers, or other extra functions and documentation URL (to change the default godoc URL), but as says @niemeyer is a new extra feature that requires a clone

mcuadros commented 9 years ago

ping ...

niemeyer commented 11 months ago

I'm coming from the future to say it was indeed not worth the trouble. :)