Open rossjones opened 9 years ago
Tracking of API usage would be important, but I think the monitoring of other aspects is already well handled by other tools built specifically for the purpose.
Perhaps extending CKAN in a way that some stats can be more easily captured by one of these systems would be a better general approach?
Maybe the project can make https://github.com/ckan/ckanext-googleanalytics more robust. Again, consistent with the Unix toolchain approach, leveraging best-of-breed tools like Google Analytics.
Though GA is a "free as in beer" and not strictly "free as in speech" open source software, it has become the de facto standard.
With that said, the team should consider exposing the webserver logs of a CKAN instance as a dataset. In a SBIR study we did earlier this year about opendata, we found out that there is no simple way to measure the downstream usage of a dataset, which is a big signal that both data publishers and advocates need to prioritize data.
The existing reports are simply too coarse (only total aggregate views, downloads; no way to filter geotemporally). Of course, there should be some mechanism to control who has access to the webserver logs dataset. And better aggregated reports, much better than the existing ones, can be created and exposed to the general public.
From the webserver log dataset, you can even track if businesses, citizens, apps, other agencies are using the data. It can even be used to find downstream data users and automagically catalog them in CKAN's related items tab (e.g. visualizations/PDFs/sites using the https://github.com/BetaNYC/getDataButton, etc.)
the team should consider exposing the webserver logs of a CKAN instance as a dataset
Huh. That's both clever and simple, my favorite combination of traits in an idea. :) Adding a new log config line to Apache could output a properly anonymized access log directly in the webroot. I like it!
ckan-multisite has all requests going through a single HTTP router, so the access logs for all the sites can be aggregated or reported on really easily. I've opened a ticket to revisit this when we have some code to show: https://github.com/boxkite/ckan-multisite/issues/4
Great! We may also want to look at 18F's http://analytics.usa.gov for inspiration. Since we have full access logs, it doesn't directly apply, but once some instances "graduate" to their own dedicated open data installations, it may still be a way to aggregate high-level analytics.