it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.
There are three levels of user/target segmentation, which correspond to three levels of our code.
Distributed in-memory database.
This database should be as generic as possible, in the extreme case just duplicating the crossref coverage, but with a lot better performance and arbitrary SQL/dplyr queries.
Target: Analysts (us).
Code:
setup of the database (currently Google BigQuery, maybe Azure Synapse)
batch jobs to seed the db with dumps and incremental updates
example queries
Domain-specific APIs
Opinionated queries against 1 to yield domain-specific data objects (that fit into laptop memories).
A set of (multiple!) tidy data frames that make sense for hybrid open access uptake analysis, i.e. make it possible to run the plots/analyses in 3.
Target: R users interested in hybrid OA.
Code:
dplyr/sql queries against 1
additional on-client data wrangling
assertions and tests
Dashboard
Views on the data in 2 to tell answer our business questions.
Target: HOAD project stakeholders
Code:
plots (those are also part of the package proper)
dashboard (maybe modules are also part of the package)
it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.
There are three levels of user/target segmentation, which correspond to three levels of our code.