ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 7 forks source link

HTAN 1.0 Data Model: Archive Activities #448

Closed aclayton555 closed 1 day ago

aclayton555 commented 1 month ago

Split from https://github.com/ncihtan/data-models/issues/389

ARCHIVE - Proposed for August 2024. Understand and planning of steps to 'archive' the HTAN 1.0 Data Model. This will include understand if/what data are potentially still expected to be submitted under this model beyond August 31, 2024, and how this will be supported.

Questions, considerations & assumptions:

aclayton555 commented 1 month ago

Discussed during 24-8 kick-off:

Proposal: HTAN 1.0 and HTAN 2.0 will be treated separately. Any remaining HTAN 1.0 data will be submitted in accordance with the HTAN 1.0 data model. The HTAN 1.0 data model serves as a basis for the HTAN 2.0 data model, but considerable updates and refinement are planned to establish the HTAN 2.0 data model early in the renewal, prior to any HTAN 2.0 data submissions.

Action: Need to perform final review of backlog and triage for NOW, renewal, or won't do

Assumption: Data are still expected to be submitted from HTAN 1.0 centers after August 31, 2024

Question: Do we want to set up a new repo for the HTAN 2.0 data model? If so, what, if any, remaining updates should be made within the HTAN 1.0 data model repo?

Question: What maintenance and infrastructure requirements are required to be maintained after August 31, 2024 (i.e. the DCA currently points to the HTAN 1.0 model)

aclayton555 commented 2 days ago

Need to understand expectations based on the renewal proposal, and whether we need to establish mapping between data of different phases, or support a major curation effort. @aclayton555 to surface this during NYC visit in August.

24-8 Closeout: Portal already can display columns across different data types in the File explorer. So there is flexibility here. Opportunity for us to think about a minimal attributes that map across phase 1 and 2, and/or think about how we flag 1.0 vs 2.0 data. Something to think about in Y2 of the renewal when we have more 2.0 data. Nothing will break the portal immediately, but this will get complicated if we implement a more granular hierarchal structure. Based on the data types we are starting to expect in phase 2.0, a hierarchal model may be needed to adequately capture the complexities and contextual information (while balancing minimal elements and low barrier to entry). Take this into consideration in design doc planning.

Portal ticket filed here: https://github.com/ncihtan/htan-portal/issues/672

aclayton555 commented 1 day ago

Additional tidy up captured in https://github.com/Sage-Bionetworks/data_curator_config/issues/211

aclayton555 commented 1 day ago

Ticket to set up new repo in https://github.com/ncihtan/data-models/issues/463