We have struggled in the past 6 months with creating/maintaining test data that is reliable to work with for development purposes. This was especially highlighted in the donor-submission-dashboard feature development.
Test data is important for several reasons:
Verifying bugs in in-progress features development
Regression testing during releases
Types of data that we need to be testing against during development in the QA Environment
Starting from a clean slate from each time is an unrealistic testing method for these scenarios, as we know that moving into production we will not be starting with a clean slate. That's why we need to have a test data that works, that we can manipulate. This has been working well for the clinical program in DASH-CA.
Factors that have come up
The RDPC API QA data is deleted in a rolling 2-week window. This can cause differences daily in data in the RDPC API QA data
We have been using the RDPC API PROD in the QA environment to facilitate development and testing. This has env details to discuss.
When a change is made in the RDPC API QA affecting the platform as a downstream service (namely the donor-dashboard-aggregator) we had to wait for the change to be made in production to accurately test it. This is bc the RDPC API QA does not have reliable data to test with.
Developers fort-forward to databases through kubernetes. (Ex: connecting to clinical). this is a non read-only operation, and could have descriptive actions if a developer is not careful. Should be a mechanism to do this.
Test of test dataset that are sandboxed for QA use will be created on two couple test projects.
The first TEST-QA will be for the whole team to use.
The second ROSI-RU will be for Rosi to use in a curated dataset, potentially also used for demos. This data set should remain clean.
Provide analysis tsv's for Alex to run in Model T for both programs.
Intermediate-song
We should not be importing data into QA, as is happening right now with qa-intermediate-song. Don't use prod data in QA. QA should be clean.
We need a less manual process for importing legacy data. Not a long-term solution the way it is right now.
----- check with Christina on scale of the imports
----- spend time to develop an automated method that hardeep can use?
Description of Issue
We have struggled in the past 6 months with creating/maintaining test data that is reliable to work with for development purposes. This was especially highlighted in the donor-submission-dashboard feature development.
Test data is important for several reasons:
Types of data that we need to be testing against during development in the QA Environment
Starting from a clean slate from each time is an unrealistic testing method for these scenarios, as we know that moving into production we will not be starting with a clean slate. That's why we need to have a test data that works, that we can manipulate. This has been working well for the clinical program in DASH-CA.
Factors that have come up
RDPC API QA
data is deleted in a rolling 2-week window. This can cause differences daily in data in theRDPC API QA
dataRDPC API PROD
in the QA environment to facilitate development and testing. This has env details to discuss.RDPC API QA
affecting the platform as a downstream service (namely the donor-dashboard-aggregator) we had to wait for the change to be made in production to accurately test it. This is bc theRDPC API QA
does not have reliable data to test with.Proposed Solutions:
Discussion with Dusan/Alex/Jon Geb 16, 20201
RDPC-Platform Test Data
-- https://github.com/icgc-argo/workflow-roadmap/issues/99 -- https://github.com/icgc-argo/workflow-roadmap/issues/100
Intermediate-song
We should not be importing data into QA, as is happening right now with
qa-intermediate-song
. Don't use prod data in QA. QA should be clean.We need a less manual process for importing legacy data. Not a long-term solution the way it is right now. ----- check with Christina on scale of the imports ----- spend time to develop an automated method that hardeep can use?