Closed gevro closed 2 years ago
Hi gevro,
This is not a bug, it's the intended behaviour.
For the purposes of mutational signatures, all mutations are reported with reference to the pyrimidine base of the complementary pair. Meaning that, for example, a mutation of G->A is encoded as C->T, which is the complementary change. To add strandedness to this representation, if the aforementioned G->A were on the transcribed strand, it would be encoded as C->T (untranscribed), since the C used as the reference base is on the untranscribed strand. The line you commented in the code takes care of this change of strandedness.
This system of referring to mutations by their pyrimidine context is described in Alexandrov, et al. 2013. “Signatures of Mutational Processes in Human Cancer.” Nature 500 (7463): 415–21..
Best, Kevin
Thanks so much for explaining!
Hi, I think I found a bug in build_catalogues.
In the 5th column of the input: -1 should equal transcribed strand 1 should equal untranscribed strand
However, build_catalogues only puts a mutation with strand="-1" into the transcribed category if the Ref nucleotide is a pyrimidine (C or T). But if the Ref nucleotide is a purine it puts a "-1" mutation it into the untranscribed category.
This does not match the documentation. Because "-1" should always go into the transcribed category, regardless of whether Ref is a pyrimidine or a purine.
I think this is the problematic code:
I'm not sure what the purpose of the code marked with ##. It flips a mutation from untranscribed to transcribed, and vice versa, if the central nucleotide is a purine. There's no reason to change the strand of a mutation. If a mutation is annotated as transcribed, then it should be output as transcribed, and if a mutation is annotated as untranscribed then it should be annotated as untranscribed, no? Unless I'm misunderstanding something.