prio-data / views_pipeline

VIEWS forecasting pipeline for monthly prediction runs. Includes MLops and QA for all models/ensembles.
Other
3 stars 3 forks source link

Develop Feature Catalog for Model Features in Pipeline Querysets #106

Closed Polichinel closed 2 weeks ago

Polichinel commented 2 weeks ago

Issue: Develop Feature Catalog for Model Features in Pipeline Querysets

Description
Create a streamlined, human-readable catalog listing all model features present in the pipeline repository querysets, containing only essential details. This catalog will serve as a quick-reference document and should allow for semi-automated updates using information from common_querysets.

Objectives and Requirements

  1. Catalog Structure:

    • Include only essential information for each feature, such as:
      • Name in viewser
      • Human-readable name
      • Data source (as available with link)
      • Last updated in format minutes:hours:day:month:year
      • Associated querysets/models
    • Reference Angelica's Excel ARC for data sources where necessary, though this catalog should be less detailed.
    • Ensure consistent formatting, particularly for Last updated, to prevent errors and support easy readability.
  2. Automate Where Possible:

    • Aim to populate fields using information from common_querysets to automate catalog creation.
    • Note that fields such as Human-readable name and Data source will require manual updates due to the nature of their information.
    • Any fields that cannot be automatically inferred should contain placeholders such as "TODO" or "needs manual updating" for later manual review.
  3. Catalog Format:

    • Store the catalog in Markdown (.md) format for easy readability and accessibility within the pipeline repo documentation/catalogs/.
    • Use a Markdown table structure to organize fields, enabling quick scanning and readability within the pipeline repository.

Tasks

Next Steps
After completing the catalog, define an update procedure to ensure consistency, particularly for fields that cannot be automatically populated. Plan regular updates (e.g., quarterly or upon major feature changes) to keep information accurate.

Labels
feature catalog, documentation, pipeline, querysets