jydu / maffilter

The MafFilter genome alignment processor
GNU General Public License v3.0
17 stars 5 forks source link

SequenceStatistics not reporting statistics #9

Closed PlantDr430 closed 4 years ago

PlantDr430 commented 4 years ago

Hello,

I was trying to use SequenceStatistics() to generate some statistics on my alignments. A subset of my .maf file looks like this

##maf version=1 program=Bio++
#
a score=9.67276e+06
s Clav55.chimtig1  0 977831 ?  40493 TCCACCGTACTCGAGACTCCGGGTCTCGGCCCTTCCGTTGCCCTACTCTCNNNNNNNNNNNNNNNNNNCGCGC
s Cpurp.191       15 977966 + 977986 GCCACCGTACTCGGGACTCCGAGTCTCGGCCCTTCCGTTGCCCTACTCTCNNNNNNNNNNNNNNNNNNCGCGC
s LM14.chimtig1    0 976171 ?  11120 TCCACCGTACTCGAGACTCCGGGTCTCGGCCCTTCCGTTGCCGTACTCTCNNNNNNNNNNNNNNNNNNCGCGC
s LM582.chimtig1   0 977664 ?  80624 GCCACCGTACTCGGGACTCCGAGTCTCGGCCCTTCCGTTGCCCTACTCTCNNNNNNNNNNNNNNNNNNCGCGC

a score=7.08149e+06
s Clav55.chimtig2   0 110020 ?  14865 CTCAGCGCAGAATAAGTCACTATTTGGTCCATCGATCGGTTTATCGCTAAGCCAATTGAGTCTTGTCACTCA
s Cpurp.118       237 110572 + 110809 CTCAGTGCAGAACAAGTCACCGTTTGGTCCAGCGATCGGTGTATCGCTAAGCCACTTGAGTCTCGTCACTCA
s LM14.chimtig2     0 110407 ?  16030 CTCAGCGCAGAATAAGTCACTATTTGGTCCATCGATCGGTTTATCGCTAAGCCAATTGAGTCTTGTCACTCA
s LM582.chimtig2    0 110606 ?  14909 CTCAGTGCAGAACAAGTCACCGTTTGGTCCAGCGATCGGTGTATCGCTAAGCCACTTGAGTCTCGTCACTCA

a score=8.69699e+06
s Clav55.chimtig3 0 296004 ?  11703 ACAATTAAGGAGGAGGAAGAAGAAGACGATGAACTTAATTACGAGGAGAGGATGAATAGAAATTTGAAACTGCT
s Cpurp.150       8 295666 + 295685 ACAATTAAGGAGGAGGAAGAAGAAGACGATGAACTTAATTACGAGGAGAGGATGAATGGAAATTTGAAACTGCT
s LM14.chimtig3   0 295426 ?  13865 ACAATTAAGGAGGAGGAAGAAGAAGACGATGAACTTAATTACGAGGAGAGGATGAATAGAAACTTGAAACTGCT
s LM582.chimtig3  0 295685 ?   8337 ACAATTAAGGAGGAGGAAGAAGAAGACGATGAACTTGATTACGAGGAGAGGATGAATAGAAACTTGAAACTGCT

With my maf_filter function as such:

maf.filter= \
    SequenceStatistics( \
        statistics( \
            BlockLength(), \
            AlnScore(), \
            BlockCounts(), \
            PairwiseDivergence( \
                Cpurp=Clav55)), \
        ref_species=Cpurp, \
        file=data.statistics.csv), \
    Output( \
        file=population_fulllength_stats.maf) \

The data presented in my data.statistics.csv is just the chr start and stop with no other columns where the statistics should be presented, for example:

Chr Start   Stop
191 15  977981
118 237 110809
150 8   295674
102 1081    73646
10  288 2111

I have also tried this with SiteStatistics() and DiversityStatistics() which also don't produce any results. Is there something I am missing? Some additional steps that need to be performed prior to getting statistic results?

PlantDr430 commented 4 years ago

Nevermind. Realized I was missing a "=" in my "statistics(" call

jydu commented 4 years ago

Hi Stephen,

There are a few syntax issues. Can you try the following code instead?

maf.filter= \
    SequenceStatistics( \
        statistics = (\
            BlockLength(), \
            AlnScore(), \
            BlockCounts(), \
            PairwiseDivergence( \
                               species1=Cpurp,\
                               species2=Clav55)), \
        ref_species=Cpurp, \
        file=data.statistics.csv), \
    Output( \
        file=population_fulllength_stats.maf) \
PlantDr430 commented 4 years ago

Ah, I was also assuming in PairwiseDistance the "=" was more of a "versus". Thanks for the assistance.

jydu commented 4 years ago

Nevermind. Realized I was missing a "=" in my "statistics(" call

Cross-posting... yes, please also have a look at the PairwiseDivergence arguments too!

Cheers,

Julien.