get_gambl_metadata replaces og_get_gambl_metadata functionality but allows backwards compatibility. This function does a variety of checks for consistency in the metadata/biopsy tables and reports issues it finds.
check_gene_expression ensures RNA-seq is prioritized the way we want and always deduplicated
get_gene_expression now relies on both the new functions and their new features
exome_priority_meta = get_gambl_metadata(capture_protocol_priority = "Exome")
2908 capture samples are missing a value for protocol. Assuming Exome.
55 biopsies are missing from the biopsy metadata. This should be fixed!
affected cohorts: DLBCL_Weng,DLBCL_Jain
27 biopsies with discrepancies in the pathology field. This should be fixed!
10 biopsies with discrepancies in the time_point field. This should be fixed!
> table(exome_priority_meta$protocol)
Capture chromium ctDNA_Genome Exome genome Genome polyA PolyA
34 3 5 3213 68 1854 11 1108
Ribodepletion
1235
not_exome_priority_meta = get_gambl_metadata(capture_protocol_priority = "Capture")
2908 capture samples are missing a value for protocol. Assuming Exome.
55 biopsies are missing from the biopsy metadata. This should be fixed!
affected cohorts: DLBCL_Weng,DLBCL_Jain
27 biopsies with discrepancies in the pathology field. This should be fixed!
10 biopsies with discrepancies in the time_point field. This should be fixed!
> table(not_exome_priority_meta$protocol)
Capture chromium ctDNA_Genome Exome genome Genome polyA PolyA
52 3 5 3195 68 1854 11 1108
Ribodepletion
1235
Would someone be able to test this again now that I've patched a few more bugs? It was a bit more involved to implement properly but it looks like the format (long/wide) is working as expected.
The main changes are: