nih-cfde / icc-eval-core

(WIP) Tools for collecting and reporting CFDE metrics
https://cfde-eval.netlify.app/
1 stars 1 forks source link
u54od036472

Requirements

Pipeline

The automated steps in this repo are roughly as follows:

  1. Ingest
    1. Get raw data from an external resource, either by scraping an HTML page, downloading and parsing a PDF or CSV, or making a request to an API.
    2. Save raw data exactly as-is for provenance and caching.
    3. Collate most important information from raw data into common high-level output data format suited to making desired reports.
    4. Repeat previous steps in order of dependency (e.g. opportunity number -> grant numbers) until all needed info is gathered.
  2. Print
    1. Run webapp (interactive dashboard that provides access to all reports).
    2. Import output data from ingest, and do some minimal final processing (e.g. combine journal info with each publication listing).
    3. Render select webapp pages (e.g. /core-project/abc123) to PDF reports.
  3. Deploy webapp and PDFs to live, public web addresses.

Repo Content

Technology

The ingest pipeline is optimized wherever possible and appropriate. Things like network requests and rendering are parallelized (e.g. PDF reports are printed simultaneously in separate tabs of the same Playwright browser instance). External resources are cached in their raw format to speed up subsequent runs, and to avoid being rate-limited or blocked by those providers.

Commands

Use ./run.sh with flags to run specific steps or tasks in this repo:

Flag Description
--install Install packages and dependencies
--ingest Run "ingest" pipeline step
--print Run "print" pipeline step
no flag Run pipeline steps in order
--app Run webapp in dev mode
--test Run all tests (type-checking, linting/formatting checks, etc.)
--lint Auto-fix linting/formatting