Closed vladimirsouza closed 5 months ago
I don't know whether this is really a problem. The duplications come from different cohorts.
> my_meta_genome_capture = get_gambl_metadata(seq_type_filter = c("capture", "genome")) Using the bundled metadata in GAMBLR.data... > > duplicate_sample_ids <- duplicated(my_meta_genome_capture$sample_id) %>% + my_meta_genome_capture$sample_id[.] %>% + unique > duplicate_sample_ids [1] "05-32150T" "08-15460T" "09-33003T" "15-13383T" "17-36275T" > > filter(my_meta_genome_capture, sample_id %in% duplicate_sample_ids) %>% + split(.$sample_id) $`05-32150T` patient_id sample_id Tumor_Sample_Barcode seq_type sex COO_consensus lymphgen genetic_subgroup EBV_status_inf cohort pathology 1 05-32150 05-32150T 05-32150T genome F ABC MCD dFL <NA> FL_Dreval DLBCL 6 05-32150 05-32150T 05-32150T genome F ABC MCD <NA> <NA> DLBCL_Hilton DLBCL reference_PMID genome_build pairing_status age_group compression bam_available pathology_rank DHITsig_consensus ffpe_or_frozen fl_grade 1 37084389 <NA> <NA> <NA> <NA> NA NA <NA> <NA> <NA> 6 37319384 grch37 matched Other bam TRUE 19 DHITsigNeg frozen <NA> hiv_status lymphgen_cnv_noA53 lymphgen_no_cnv lymphgen_with_cnv lymphgen_wright molecular_BL normal_sample_id time_point 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 6 <NA> MCD MCD MCD Other <NA> 05-32150N A $`08-15460T` patient_id sample_id Tumor_Sample_Barcode seq_type sex COO_consensus lymphgen genetic_subgroup EBV_status_inf cohort pathology 2 08-15460 08-15460T 08-15460T genome F UNCLASS BN2 dFL <NA> FL_Dreval DLBCL 7 08-15460 08-15460T 08-15460T genome F UNCLASS BN2 <NA> <NA> DLBCL_Hilton DLBCL reference_PMID genome_build pairing_status age_group compression bam_available pathology_rank DHITsig_consensus ffpe_or_frozen fl_grade 2 37084389 <NA> <NA> <NA> <NA> NA NA <NA> <NA> <NA> 7 37319384 grch37 matched Other bam TRUE 19 DHITsigNeg frozen <NA> hiv_status lymphgen_cnv_noA53 lymphgen_no_cnv lymphgen_with_cnv lymphgen_wright molecular_BL normal_sample_id time_point 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 7 NEG BN2 BN2 BN2 Other <NA> 08-15460N A $`09-33003T` patient_id sample_id Tumor_Sample_Barcode seq_type sex COO_consensus lymphgen genetic_subgroup EBV_status_inf cohort pathology 3 09-33003 09-33003T 09-33003T genome M GCB BN2 dFL <NA> FL_Dreval DLBCL 8 09-33003 09-33003T 09-33003T genome M GCB BN2 <NA> <NA> DLBCL_Hilton DLBCL reference_PMID genome_build pairing_status age_group compression bam_available pathology_rank DHITsig_consensus ffpe_or_frozen fl_grade 3 37084389 <NA> <NA> <NA> <NA> NA NA <NA> <NA> <NA> 8 37319384 grch37 matched Other cram TRUE 19 DHITsigNeg frozen <NA> hiv_status lymphgen_cnv_noA53 lymphgen_no_cnv lymphgen_with_cnv lymphgen_wright molecular_BL normal_sample_id time_point 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 8 <NA> BN2 BN2 BN2 <NA> <NA> 09-33003_normal A $`15-13383T` patient_id sample_id Tumor_Sample_Barcode seq_type sex COO_consensus lymphgen genetic_subgroup EBV_status_inf cohort pathology 4 15-13383 15-13383T 15-13383T genome F ABC BN2 dFL <NA> FL_Dreval DLBCL 9 15-13383 15-13383T 15-13383T genome F ABC BN2 <NA> <NA> DLBCL_Hilton DLBCL reference_PMID genome_build pairing_status age_group compression bam_available pathology_rank DHITsig_consensus ffpe_or_frozen fl_grade 4 37084389 <NA> <NA> <NA> <NA> NA NA <NA> <NA> <NA> 9 37319384 grch37 matched Other bam TRUE 19 DHITsigNeg frozen <NA> hiv_status lymphgen_cnv_noA53 lymphgen_no_cnv lymphgen_with_cnv lymphgen_wright molecular_BL normal_sample_id time_point 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 9 NEG BN2 Other BN2 <NA> <NA> 15-13383N A $`17-36275T` patient_id sample_id Tumor_Sample_Barcode seq_type sex COO_consensus lymphgen genetic_subgroup EBV_status_inf cohort pathology 5 17-36275 17-36275T 17-36275T genome M GCB Other dFL <NA> FL_Dreval DLBCL 10 17-36275 17-36275T 17-36275T genome M GCB Other <NA> <NA> DLBCL_Hilton DLBCL reference_PMID genome_build pairing_status age_group compression bam_available pathology_rank DHITsig_consensus ffpe_or_frozen fl_grade 5 37084389 <NA> <NA> <NA> <NA> NA NA <NA> <NA> <NA> 10 37319384 grch37 matched Other cram TRUE 19 DHITsigNeg frozen <NA> hiv_status lymphgen_cnv_noA53 lymphgen_no_cnv lymphgen_with_cnv lymphgen_wright molecular_BL normal_sample_id time_point 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 10 NEG Other Other Other <NA> <NA> 17-36275_normal A
This is expected because the same sample can be part of different studies.
I don't know whether this is really a problem. The duplications come from different cohorts.