Open jrgriffiniii opened 4 years ago
Documentation for this can be found in the PR connected to #21.
The stats I'd like to pull from OAR (Open Access Repository, another Dspace instance) are the number of uploads and downloads/page views within a certain date range (e.g. 7/1/2018-6/30/2020). That's part of what library need to submit to ARL annually. Last time, those were provided by James who manually gathered that data.
We also would like to have some other usage stats, like the geolocation of the users, and the most visited collections, or items(articles), etc... We used to have a google analytics account for OAR, which provides some useful stats data, such as the maps of users. I used those data in previous years.
The first request is scripted in the statistics
directory. After editing the statistics/community.yml
file with the appropriate ips to exclude, time slot, and community handles, we'd like to calculate the statistics for, we can run the following commands:
In order to get the download count of bitstreams. This defaults to the top 10 most popular bitstream downloads, but if we edit top_bitstreams: { number: ## ...
in statistics/community.yml
we can get the download count of all bitstreams if she'd prefer.
ruby statistics/community.rb --bitstreams top_bitstreams.txt --yaml statistics/community.yml
To get the upload count for each community run the following command:
ruby statistics/community.rb --collection_counts submitter_count.txt --yaml statistics/community.yml
NOTES
I think this functionality is offered through the statistics/all_events.rb
script, but this seems to give details on every page view. Figuring out how to access the OAR's Google Analytics page is likely more straightforward than parsing through hundreds of thousands of site visits. This is being tracked in #57
I hadn't switched to dspace user from root. The statistics directory does exist on the OAR site.
Attempting to test this on the OAR production environment, the following was found:
dspace@oar-prod-repo: master cli> gem install http
ERROR: Could not find a valid gem 'http' (>= 0), here is why:
Unable to download data from https://rubygems.org/ - Received fatal alert: protocol_version (https://rubygems.org/latest_specs.4.8.gz)
I was able to request resources from the RubyGems server:
dspace@oar-prod-repo: master cli> curl -I https://rubygems.org/
HTTP/1.1 200 OK
https://github.com/rubygems/rubygems/issues/2328#issuecomment-400335522 may be the cause for this, but I'm still not able to remedy on the production server.
As for generating reports locally, I introduced the following: https://github.com/akinom/dspace-cli/compare/master...jrgriffiniii:disable-dspace-api
Apologies, I never committed this changes, but this should work with port forwarding used for the DSpace Solr endpoint.
Advances https://github.com/pulibrary/dspace-cli/issues/21