icgc-argo / roadmap

Place to review/request new features and new tools on ICGC-ARGO's roadmap
1 stars 0 forks source link

Test Data Set for the QA Environment #711

Open rosibaj opened 3 years ago

rosibaj commented 3 years ago

Description of Issue

We have struggled in the past 6 months with creating/maintaining test data that is reliable to work with for development purposes. This was especially highlighted in the donor-submission-dashboard feature development.

Test data is important for several reasons:

Types of data that we need to be testing against during development in the QA Environment

Starting from a clean slate from each time is an unrealistic testing method for these scenarios, as we know that moving into production we will not be starting with a clean slate. That's why we need to have a test data that works, that we can manipulate. This has been working well for the clinical program in DASH-CA.

Factors that have come up

Discussion with Dusan/Alex/Jon Geb 16, 20201

RDPC-Platform Test Data

  1. Set the retention policy for RDPC QA to mimic production, that is Kafka retention forever, and backups to reinstitute the data in case of disaster
    -- https://github.com/icgc-argo/workflow-roadmap/issues/99 -- https://github.com/icgc-argo/workflow-roadmap/issues/100
  2. Test of test dataset that are sandboxed for QA use will be created on two couple test projects.
    • The first TEST-QA will be for the whole team to use.
    • The second ROSI-RU will be for Rosi to use in a curated dataset, potentially also used for demos. This data set should remain clean.
  3. Provide analysis tsv's for Alex to run in Model T for both programs.

Intermediate-song

rosibaj commented 3 years ago

List of prgrams with data in SONG QA RDPC:

[
    "TEST-CA",
    "TEST-PR",
    "OCCAMS-GB",
    "EUCANCAN-BE",
    "PACA-CA",
    "PTC-SA",
    "JAS-CA",
    "DASH-CA",
    "ROSI-RU"
]