zooniverse / zoo-stats-api-graphql

0 stars 2 forks source link

Add RESTful end point to access continous aggregate group queries #143

Open camallen opened 3 years ago

camallen commented 3 years ago

This PR adds continuous aggregates to sumate a group's daily classification contributions (CA) using the timescale's continuous aggregates https://legacy-docs.timescale.com/v1.7/api#continuous-aggregates and exposes a RESTful API to access these counts.

Specifically this PR

  1. upgrades the timescale extension to 1.7.4 (the latest Azure currently allows)
  2. adds rake tasks to enable the CA
  3. ensures the CA are setup in the test env
  4. creates a CA on the group_id attribute in the events table
  5. adds a read only AR model to query the materialized CA view
  6. adds a RESTful group count route controller action
  7. adds a simple JSON events serializer that does simple AR scope limits and ordering
  8. adds how to use API docs including the returned JSON schema format
  9. removes unused rails components from loading (align to API style rails service)
  10. updates / removes dev & test gems (maintenance to get the setup working well)

This PR is intended for use by the FACTSet team to build the group query dashboard functionality.

Longer term this PR will be employed by the Zooniverse team to expand the current timescale Stats API to use continuous aggregates for exposing improved API query types.

Some items of additional work could be to:

  1. add per (user|project|workflow) continuous aggregates in the (hour|day|month|year) time buckets
  2. expose the above metrics via RESTful API end points
  3. add serializer decorator object to the AR scopes to build backwards compatible API end points for clients that consume the https://github.com/zooniverse/zoo-event-stats/ RESTful API
  4. look at adding compression https://legacy-docs.timescale.com/v1.7/api#compression and data management drop chunks policies https://legacy-docs.timescale.com/v1.7/api#add_drop_chunks_policy to ensure we only keep the relevant data in the DB
  5. launch this independently of the current timescale DB to avoid upgrades on old data and start with a blank slate.