openequella / openEQUELLA

Core openEQUELLA sources
https://openequella.github.io/
Apache License 2.0
42 stars 44 forks source link

Create a reporting tool in core oEQ to provide usage analysis #724

Open cathfitz opened 5 years ago

cathfitz commented 5 years ago

Provide the ability to view basic temporal metrics around usage (access) and edit activity at an item and collection level. Additionally at the collection level also include growth - i.e. number of items over time.

ChristianMurphy commented 5 years ago

This sounds great. :+1: It would be great if the data could be collected as https://xapi.com/overview and/or https://www.imsglobal.org/activity/caliper so that an outside system can provide the temporal views (E.G. https://www.influxdata.com, https://www.elastic.co/products/elasticsearch, etc).

I'd caution against collecting all that information directly in openEquella, time series analysis can become pretty resource intensive overtime, having a dedicated system like ElasticSearch provides some nice time series optimizations, and can be externalized so resource pressure will not impact the openEquella server.

edalex-ian commented 5 years ago

Heya @ChristianMurphy

Thanks for the input. It was definitely interesting to see the education sector specific standards around this. Unfortunately at this time we wont be able to extend our efforts that far - i.e. there is no additional data collection planned here.

For now this will be rather simplistic and purely focused towards the input provided by clients. It will work through a simple ETL from the existing oEQ Audit Log table to produce a simple data mart (either in the same schema or alongside) utilising star schemas to provide a somewhat aggregated (maybe at day level) representation of views and edits of attachments and items.

There'll then be a couple of simple UI elements at the attachment/item/collection level to show counts by time blocks; and then also a page for viewing trends etc.

Currently users/clients are achieving similar functionality by increasing the duration of their Audit Log, but find after ~5 years they're exceeding about 70 million rows and their DBAs are not typically happy with this.

IS

P.S. Although, mind you it would be fun to bring in some ES or InfluxDB, but again I fear we're not in a position for that just yet. However, it could be interesting to do a logstash feed (or ES Beats) into ES, but I'm not sure the dependency of ES or InfluxDB will solve the request.

ChristianMurphy commented 5 years ago

I'm not sure the dependency of ES or InfluxDB will solve the request.

Different DBs give it a different names hypertables, search aggregation, continuous queries, etc. Conceptually they all works the same, taking a time based dataset, and memoizing timeslices so queries don't need to read the entire dataset. Which sounds like exactly what is being described in

utilising star schemas to provide a somewhat aggregated (maybe at day level) representation of views and edits of attachments and items.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.