ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 7 forks source link

Sync released data models to BQ tables #420

Open adamjtaylor opened 5 months ago

adamjtaylor commented 5 months ago

As a data manager for HTAN I would like our internal BigQuery `htan-dcc:metadata tables to include

This allows us to ensure that we can use this in queries against our submitted manifests or other information held in BigQuery

We can extend the bq-schema workflow as follows

Add running when a release is created

on:
  push:
    branches: main
    paths: 'HTAN.model.csv'
  release:
    types: [created]
  workflow_dispatch: 

Add a job to create a versioned table if the event name is release

  add-versioned-table:
    name: Add versioned schema to BQ
    runs-on: ubuntu-latest
    needs: add-to-bq
    if: github.event_name == 'release'

Then duplicate the versioned table as latest

      - name: Duplicate versioned table as latest
        shell: bash
        run: |
          VERSION=${{ github.event.release.tag_name }}
          bq cp htan-dcc:metadata.data_model_${VERSION} htan-dcc:metadata.data_model_latest
aclayton555 commented 4 months ago

Please add a "critical" label if expected within phase 1.0. Or a "renewal" label if this can wait.

aclayton555 commented 2 months ago

Need to discuss with ISB during data flow discussions. @aclayton555 tag in flow diagram. Need to understand how users are engaging with BQ

aclayton555 commented 2 months ago

Currently, there is workflow there is workflow to sync the staging version of the data model with the BQ tables. Used to help populate attribute description. This is still currently in use, but @PozhidayevaDarya has not run it for a little while. TBD on updating this to include the attributes listed here AND update this so that it is no longer pointing to staging.

aclayton555 commented 1 month ago

24-8 Close-out: take this into consideration in the data model design doc. Need to understand what is needed here and the needed architecture.