subugoe / hoad

Deprecated: Please check https://github.com/subugoe/hoaddash
https://github.com/subugoe/hoaddash
GNU Affero General Public License v3.0
15 stars 4 forks source link

write up design principles for hoad #249

Open maxheld83 opened 4 years ago

maxheld83 commented 4 years ago

it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.

There are three levels of user/target segmentation, which correspond to three levels of our code.

  1. Distributed in-memory database. This database should be as generic as possible, in the extreme case just duplicating the crossref coverage, but with a lot better performance and arbitrary SQL/dplyr queries.
    • Target: Analysts (us).
    • Code:
      • setup of the database (currently Google BigQuery, maybe Azure Synapse)
      • batch jobs to seed the db with dumps and incremental updates
      • example queries
  2. Domain-specific APIs Opinionated queries against 1 to yield domain-specific data objects (that fit into laptop memories). A set of (multiple!) tidy data frames that make sense for hybrid open access uptake analysis, i.e. make it possible to run the plots/analyses in 3.
    • Target: R users interested in hybrid OA.
    • Code:
      • dplyr/sql queries against 1
      • additional on-client data wrangling
      • assertions and tests
  3. Dashboard Views on the data in 2 to tell answer our business questions.
    • Target: HOAD project stakeholders
    • Code:
      • plots (those are also part of the package proper)
      • dashboard (maybe modules are also part of the package)
maxheld83 commented 4 years ago

this is just quickly jotted down, should be in the repo somewhere