kulibraries / more-dspace-statistics

This web application provides simple aggregated usage statistics for a DSpace repository
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

Support for Solr Sharding #1

Closed misilot closed 5 years ago

misilot commented 5 years ago

Hello,

From what I can tell is there is not support for statistics that have been sharded by years via the dspace tool. Would it be possible to add this in?

Thanks!

shorock commented 5 years ago

Ooh, good question. We (so far) have decided against the DSpace stats native year-sharding strategy. At some point, we decided for memory and performance reasons to put our Solr on a separate VM from the DSpace code Tomcat. (we're using Solr 7.x btw... I gather we're one of the few DSpace installs doing that). That fairly breaks how DSpace's internal stats code figures out year cores to query and aggregate. (function @ https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-api/src/main/java/org/dspace/statistics/SolrLoggerServiceImpl.java#L1606 -- just presumes to know where the Solr guts are and start looking for statistics-* directories). If we were moved to shard statistics, we were planning to use normal SolrCloud sharding.

This code (it has been a while since I've looked at it... and I'm not the original author, but I've inherited it I guess) does not typically run on the same server as the rest of DSpace. DSpace is heavy enough... why would you add PHP to the box too? The previous author wrote this in PHP to run on our shared hosting platform.

If you really trust your firewall or Solr read-only proxy game, I guess you could allow read access to the Solr Cores API (which did exist in Solr 4, as best I remember?) at http://dspace-install.my.edu/readonlysolr/admin/cores?action=STATUS&wt=json, and this code could be extended to figure out the stats cores to aggregate from there? I'd rather do that than look at directories anyway.

Anyway - it's possible? Can you get to your /solr/admin/cores API? Also wanted to just wave a hello from your in-state counterparts over in Lawrence. If you all ever want to get together for an informal Code4Lib Kansas - or just an evening appetizers-and-commiseration, we'd be interested.

misilot commented 5 years ago

@shorock thank you for the information. We were testing sharding to see if that would help resolve an issue with solr, but we will look at increasing the available memory in it, instead of sharding.

Thanks!