kgori / sigfit

Flexible Bayesian inference of mutational signatures
GNU General Public License v3.0
33 stars 8 forks source link

Incorrect labels in code documentation of human_trinuc_freqs #69

Closed gevro closed 1 year ago

gevro commented 1 year ago

Hi, I think the labels in your documentation of human_trinuc_freqs is incorrect. The numbers are correct and in the correct order.

However, the code documentation on the right side is not. Based on manual hg19 trinucleotidefrequency calculation, I'm quite sure this should be the order of the labels in the documentation:

 [1] "ACA" "ACC" "ACG" "ACT" "CCA" "CCC" "CCG" "CCT" "GCA" "GCC" "GCG" "GCT" "TCA" "TCC" "TCG" "TCT" "ACA" "ACC" "ACG"
[20] "ACT" "CCA" "CCC" "CCG" "CCT" "GCA" "GCC" "GCG" "GCT" "TCA" "TCC" "TCG" "TCT" "ACA" "ACC" "ACG" "ACT" "CCA" "CCC"
[39] "CCG" "CCT" "GCA" "GCC" "GCG" "GCT" "TCA" "TCC" "TCG" "TCT" "ATA" "ATC" "ATG" "ATT" "CTA" "CTC" "CTG" "CTT" "GTA"
[58] "GTC" "GTG" "GTT" "TTA" "TTC" "TTG" "TTT" "ATA" "ATC" "ATG" "ATT" "CTA" "CTC" "CTG" "CTT" "GTA" "GTC" "GTG" "GTT"
[77] "TTA" "TTC" "TTG" "TTT" "ATA" "ATC" "ATG" "ATT" "CTA" "CTC" "CTG" "CTT" "GTA" "GTC" "GTG" "GTT" "TTA" "TTC" "TTG"
[96] "TTT"

i.e. these labels:

 [1] "ACA>AAA" "ACC>AAC" "ACG>AAG" "ACT>AAT" "CCA>CAA" "CCC>CAC" "CCG>CAG" "CCT>CAT" "GCA>GAA" "GCC>GAC" "GCG>GAG"
[12] "GCT>GAT" "TCA>TAA" "TCC>TAC" "TCG>TAG" "TCT>TAT" "ACA>AGA" "ACC>AGC" "ACG>AGG" "ACT>AGT" "CCA>CGA" "CCC>CGC"
[23] "CCG>CGG" "CCT>CGT" "GCA>GGA" "GCC>GGC" "GCG>GGG" "GCT>GGT" "TCA>TGA" "TCC>TGC" "TCG>TGG" "TCT>TGT" "ACA>ATA"
[34] "ACC>ATC" "ACG>ATG" "ACT>ATT" "CCA>CTA" "CCC>CTC" "CCG>CTG" "CCT>CTT" "GCA>GTA" "GCC>GTC" "GCG>GTG" "GCT>GTT"
[45] "TCA>TTA" "TCC>TTC" "TCG>TTG" "TCT>TTT" "ATA>AAA" "ATC>AAC" "ATG>AAG" "ATT>AAT" "CTA>CAA" "CTC>CAC" "CTG>CAG"
[56] "CTT>CAT" "GTA>GAA" "GTC>GAC" "GTG>GAG" "GTT>GAT" "TTA>TAA" "TTC>TAC" "TTG>TAG" "TTT>TAT" "ATA>ACA" "ATC>ACC"
[67] "ATG>ACG" "ATT>ACT" "CTA>CCA" "CTC>CCC" "CTG>CCG" "CTT>CCT" "GTA>GCA" "GTC>GCC" "GTG>GCG" "GTT>GCT" "TTA>TCA"
[78] "TTC>TCC" "TTG>TCG" "TTT>TCT" "ATA>AGA" "ATC>AGC" "ATG>AGG" "ATT>AGT" "CTA>CGA" "CTC>CGC" "CTG>CGG" "CTT>CGT"
[89] "GTA>GGA" "GTC>GGC" "GTG>GGG" "GTT>GGT" "TTA>TGA" "TTC>TGC" "TTG>TGG" "TTT>TGT"

whereas the documentation has this:

        # Human genome trinucleotide frequencies (from EMu)
        freq <- c(1.14e+08, 6.60e+07, 1.43e+07, 9.12e+07, # C>A @ AC[ACGT]
                  1.05e+08, 7.46e+07, 1.57e+07, 1.01e+08, # C>A @ CC[ACGT]
                  8.17e+07, 6.76e+07, 1.35e+07, 7.93e+07, # C>A @ GC[ACGT]
                  1.11e+08, 8.75e+07, 1.25e+07, 1.25e+08, # C>A @ TC[ACGT]
                  1.14e+08, 6.60e+07, 1.43e+07, 9.12e+07, # C>G @ AC[ACGT]
                  1.05e+08, 7.46e+07, 1.57e+07, 1.01e+08, # C>G @ CC[ACGT]
                  8.17e+07, 6.76e+07, 1.35e+07, 7.93e+07, # C>G @ GC[ACGT]
                  1.11e+08, 8.75e+07, 1.25e+07, 1.25e+08, # C>G @ TC[ACGT]
                  1.14e+08, 6.60e+07, 1.43e+07, 9.12e+07, # C>T @ AC[ACGT]
                  1.05e+08, 7.46e+07, 1.57e+07, 1.01e+08, # C>T @ CC[ACGT]
                  8.17e+07, 6.76e+07, 1.35e+07, 7.93e+07, # C>T @ GC[ACGT]
                  1.11e+08, 8.75e+07, 1.25e+07, 1.25e+08, # C>T @ TC[ACGT]
                  1.17e+08, 7.57e+07, 1.04e+08, 1.41e+08, # T>A @ AC[ACGT]
                  7.31e+07, 9.55e+07, 1.15e+08, 1.13e+08, # T>A @ CC[ACGT]
                  6.43e+07, 5.36e+07, 8.52e+07, 8.27e+07, # T>A @ GC[ACGT]
                  1.18e+08, 1.12e+08, 1.07e+08, 2.18e+08, # T>A @ TC[ACGT]
                  1.17e+08, 7.57e+07, 1.04e+08, 1.41e+08, # T>C @ AC[ACGT]
                  7.31e+07, 9.55e+07, 1.15e+08, 1.13e+08, # T>C @ CC[ACGT]
                  6.43e+07, 5.36e+07, 8.52e+07, 8.27e+07, # T>C @ GC[ACGT]
                  1.18e+08, 1.12e+08, 1.07e+08, 2.18e+08, # T>C @ TC[ACGT]
                  1.17e+08, 7.57e+07, 1.04e+08, 1.41e+08, # T>G @ AC[ACGT]
                  7.31e+07, 9.55e+07, 1.15e+08, 1.13e+08, # T>G @ AC[ACGT]
                  6.43e+07, 5.36e+07, 8.52e+07, 8.27e+07, # T>G @ AG[ACGT]
                  1.18e+08, 1.12e+08, 1.07e+08, 2.18e+08) # T>G @ AT[ACGT]
kgori commented 1 year ago

Hi gevro,

Well spotted, the latter half of the commented labels are wrong. They should follow the pattern AT*,CT*,GT*,TT*. We will fix this in the next release.

Thanks for the report, Kevin