Closed pdiakumis closed 1 year ago
kt_cgi <- ref_genes.list[["cancer_biomarkers_trans"]]
# merge duplicated fusions by source (see https://github.com/umccr/RNAsum/issues/89)
dup_id_cols <- c("translocation", "effector_gene", "cancer_acronym")
kt_cgi_dup <- kt_cgi |>
dplyr::group_by(translocation) |>
dplyr::filter(n() > 1) |>
dplyr::ungroup() |>
tidyr::pivot_wider(id_cols = dplyr::all_of(dup_id_cols),
names_from = "source", values_from = "source") |>
tidyr::unite(!dplyr::all_of(dup_id_cols), col = "source", sep = ";")
# A tibble: 5 × 4
translocation effector_gene cancer_acronym source
<chr> <chr> <chr> <chr>
1 MLL__MLLT1 MLL ALL;AML cgc;validated
2 MLL__MLLT10 MLL ALL;AML cgc;validated
3 MLL__MLLT3 MLL ALL;AML cgc;validated
4 MLL__MLLT4 MLL ALL;AML cgc;validated
5 MLL__MLLT6 MLL ALL;AML cgc;validated
Done via #88.
Need to merge some
MLL_*
duplicated fusions by thesource
column (i.e. have them once with asource
ofvalidated;cgc
(which happens throughout that file anyhow)). See file ininst/rawdata/cancer_biomarkers_database/cancer_genes_upon_trans.tsv
:Rest of the file is okay. Maybe should update that file when we get to that stage as well.