Closed JMIBarth closed 1 year ago
instead of the ambiguity coding (10 A, 10 T, and one W).
I don't really understand , what is that 'W' ? where does it come from. You could provide a minimum example please.
Apologies for not being precise. I assume its the IUPAC ambiguity code for the bases A, C, G, T – where "W" has the meaning that it can be either A or T. In the code on line 112 (https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar404363.java) is an example for allele frequency (AF) 0.5 where "M" is introduced instead of base C or A.
where "M" is introduced instead of base C or A.
sorry, that's still not clear to me
the line you're pointing is the consensus line, it's not part of any read.
OK, that explains it. Thanks and sorry for the confusion. I think I didn't quite understand how the program works. Could you please clarify, whether bases of half of the reads are replaced by the alternative allele (comment on line 108 in the script), or if each read is replaced by an alternative allele with probability 0.5 (if AF=0.5, see comment line 95 in the script)? Also, is there a rule regarding forward and reverse reads, i.e., equal frequency of the introduced mutation in forward and reverse reads? Thanks a lot for your help!
AF=0.5
I use a random number generator . If the random number is lower than the value for AF ( https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar404363.java#L264 ), then the ALT allele with replace the original base in the read https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar404363.java#L268
Also, is there a rule regarding forward and reverse reads
no
Thanks a lot for the swift response!
I am using the tool “Biostar404363” (introduce artificial mutations in a bam file) to simulate mutations with an allele frequency of 0.5 and I noticed that whenever there is an unequal number of reads, the base in the first listed read is changed to ambiguity code. Yet, for my application, I would rather need a slightly unequal frequency in cases of unequal read number (e.g., 10 times base A and 11 times base T) instead of the ambiguity coding (10 A, 10 T, and one W).
Would be possible to implement such an option that lets you choose to have either ambiguity code or a slightly unequal allele frequency in cases of unequal read number?