pulibrary / dspace-cli

command line scripts accessing and modifying DSpace content
0 stars 0 forks source link

Evaluate (and, if possible and feasible, improve the documentation for) the statistics reporting procedures #68

Open jrgriffiniii opened 4 years ago

jrgriffiniii commented 4 years ago

Advances https://github.com/pulibrary/dspace-cli/issues/21

kmcelwee commented 4 years ago

Documentation

Documentation for this can be found in the PR connected to #21.

Request from Yuan

The stats I'd like to pull from OAR (Open Access Repository, another Dspace instance) are the number of uploads and downloads/page views within a certain date range (e.g. 7/1/2018-6/30/2020). That's part of what library need to submit to ARL annually. Last time, those were provided by James who manually gathered that data.

We also would like to have some other usage stats, like the geolocation of the users, and the most visited collections, or items(articles), etc... We used to have a google analytics account for OAR, which provides some useful stats data, such as the maps of users. I used those data in previous years.

"uploads and downloads/page views"

The first request is scripted in the statistics directory. After editing the statistics/community.yml file with the appropriate ips to exclude, time slot, and community handles, we'd like to calculate the statistics for, we can run the following commands:

In order to get the download count of bitstreams. This defaults to the top 10 most popular bitstream downloads, but if we edit top_bitstreams: { number: ## ... in statistics/community.yml we can get the download count of all bitstreams if she'd prefer.

ruby statistics/community.rb --bitstreams top_bitstreams.txt --yaml statistics/community.yml

To get the upload count for each community run the following command:

ruby statistics/community.rb --collection_counts submitter_count.txt --yaml statistics/community.yml

NOTES

"geolocation of users"

I think this functionality is offered through the statistics/all_events.rb script, but this seems to give details on every page view. Figuring out how to access the OAR's Google Analytics page is likely more straightforward than parsing through hundreds of thousands of site visits. This is being tracked in #57

kmcelwee commented 4 years ago

I hadn't switched to dspace user from root. The statistics directory does exist on the OAR site.

jrgriffiniii commented 4 years ago

Attempting to test this on the OAR production environment, the following was found:

dspace@oar-prod-repo: master cli> gem install http
ERROR:  Could not find a valid gem 'http' (>= 0), here is why:
Unable to download data from https://rubygems.org/ - Received fatal alert: protocol_version (https://rubygems.org/latest_specs.4.8.gz)

I was able to request resources from the RubyGems server:

dspace@oar-prod-repo: master cli> curl -I https://rubygems.org/
HTTP/1.1 200 OK
jrgriffiniii commented 4 years ago

https://github.com/rubygems/rubygems/issues/2328#issuecomment-400335522 may be the cause for this, but I'm still not able to remedy on the production server.

As for generating reports locally, I introduced the following: https://github.com/akinom/dspace-cli/compare/master...jrgriffiniii:disable-dspace-api

Apologies, I never committed this changes, but this should work with port forwarding used for the DSpace Solr endpoint.