Custom Cohorts are designed to be an immutable set of SG IDs, using a set of criteria.
When samples & their relevant sequence data need to be deleted, we currently remove the corresponding CPG IDs from any cohorts that contain them
The Cohorts are not marked as altered/deprecated - future analyses using the cohorts will use the new, reduced, sample set. The notion that the underlying cohort has changed is not stored anywhere.
My understanding of the original CustomCohort design was that immutability was designed for this exact purpose - the pipeline wouldn't tolerate a CC where any of the component samples were withdrawn/inactive. By removing the samples from the cohorts completely we lose the ability to make this check
Questions:
Can we do the sample removal without removing the CPG ID from the CustomCohort? I'm not sure if this is valid within the database structure we have
when the pipeline pulls each custom cohort, it can check if each of the CPG IDs is still a current/valid sequencing group
If we do need to remove from a CustomCohort, can we mark the whole Cohort as inactive
we have an active boolean on a few other entities in the database, but not CCs (AFAIK).
If we need to pull samples/CPG IDs out of a Cohort, we should be able to mark the whole cohort as inactive, possibly with a reason to go with it
Deletion for reference: https://github.com/populationgenomics/seqr-private/issues/215#issuecomment-2261973125
Questions:
Can we do the sample removal without removing the CPG ID from the CustomCohort? I'm not sure if this is valid within the database structure we have
If we do need to remove from a CustomCohort, can we mark the whole Cohort as inactive
active
boolean on a few other entities in the database, but not CCs (AFAIK).