morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
4 stars 2 forks source link

Update get_coding_ssm to load variants from latest results #64

Closed rdmorin closed 2 years ago

rdmorin commented 2 years ago

get_ssm_by_sample has recently been updated to load variants from the latest and greatest deblacklisted MAFs. We definitely need to update all other functions that get ssms to load from the same files or their derivatives. The most important one is get_coding_ssm. I've created the .CDS derivative files for the files that are loaded by get_ssm_by_sample so this should be relatively easy to adopt into the other functions by pointing them to these new files.

rdmorin commented 2 years ago

It's VERY important that the new functions all have the same functionality to drop variants with low read support and the same default as the new function. This is because the augmented MAFs can contain variants with even zero supporting reads in that sample due to their detection in another sample from the same patient. For many applications those definitely should not be present but for some applications they're needed.

rdmorin commented 2 years ago

Other affected functions that need to be updated accordingly are:

we also need to check that any function that calls these still works with the updated version, e.g.

calc_mutation_frequency_sliding_windows get_ssm_by_regions

etc

mattssca commented 2 years ago

This issue was resolved in PR https://github.com/morinlab/GAMBLR/pull/71