write up design principles for hoad

it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.

There are three levels of user/target segmentation, which correspond to three levels of our code.

Distributed in-memory database. This database should be as generic as possible, in the extreme case just duplicating the crossref coverage, but with a lot better performance and arbitrary SQL/dplyr queries.
- Target: Analysts (us).
- Code:
  - setup of the database (currently Google BigQuery, maybe Azure Synapse)
  - batch jobs to seed the db with dumps and incremental updates
  - example queries
Domain-specific APIs Opinionated queries against 1 to yield domain-specific data objects (that fit into laptop memories). A set of (multiple!) tidy data frames that make sense for hybrid open access uptake analysis, i.e. make it possible to run the plots/analyses in 3.
- Target: R users interested in hybrid OA.
- Code:
  - dplyr/sql queries against 1
  - additional on-client data wrangling
  - assertions and tests
Dashboard Views on the data in 2 to tell answer our business questions.
- Target: HOAD project stakeholders
- Code:
  - plots (those are also part of the package proper)
  - dashboard (maybe modules are also part of the package)

subugoe / hoad

write up design principles for hoad #249