These studies import without issues into cgds_triage. Error is only thrown during import into cgds_gdac.
Completed Items:
Confirmed that duplicates are not present based on key fields (entrez id, chrom, start, end, ref allele, tum allele, variant type, sample id, hgvsp_short)
Ran the check for duplicates manually on both cgds_triage and cgds_gdac - both databases pass the check. Maybe check with Java to determine if there's a difference between the mysql check and java check?
SELECT
(SELECT COUNT(*) FROM mutation_event) =
(SELECT COUNT(DISTINCT ENTREZ_GENE_ID, CHR, START_POSITION, END_POSITION, TUMOR_SEQ_ALLELE, PROTEIN_CHANGE, MUTATION_TYPE) FROM mutation_event);
built new importer with changes to cbioportal core with additional logging for debugging
To-do:
import with updated importer and attach debugger to importer to evaluate the state of the database as one of the affected studies is attempting to import
Script is running to generate a set of affected mutation_event_ids and affected cancer_studies
Once these are finished the cancer studies should be removed and reimported
We should also purge the affected mutation_event_ids ** this is done automatically as a cleanup step that executes after a cancer study has been deleted
Notes:
Studies known to be affected:
These studies import without issues into
cgds_triage
. Error is only thrown during import intocgds_gdac
.Completed Items:
cgds_triage
andcgds_gdac
- both databases pass the check. Maybe check with Java to determine if there's a difference between the mysql check and java check?To-do: