philarevalo / PopCOGenT

Microbial Populations as Clusters Of Gene Transfer
GNU General Public License v3.0
43 stars 12 forks source link

What if mugsy do not report all SNP? #38

Closed Hocnonsense closed 11 months ago

Hocnonsense commented 11 months ago

I'm reading the output of mugsy results using M1612_contigs and M1613_contigs in test files. However, I found that some results do not matched prefectly:

a score=933 label=85 mult=2
s M1612_contigs.M1612_contigs_0     889370 931 + 1691529 GAATTATACAAAAATTTATAAATAATTATATCAAATATTACCCATGGGAAAGAGTAAGTACAAGAGGGATTGGAGCAAATACGATGAGAACGTTATAATGAGATATACCCTAATGTTCCCCTTCTACGTCTTTGAACACTGGTTTACTAGCAGAGGAGAATAGGAACGCTAGGGCAAAGTATAAAGCTCCAAAGGAATTTAACGAATTCCTCCACACCTACCCTATAGGGCCATAGAAGGAGAGCACTAGAAAGACTAAAGATCATCACAACAAGCCTAGACTACTCAACAATATGGGAAAGAATAAGAAACATGAACATAACATTCCCAGAGGCAAGTGATGAACTTGAAGCAGACGCAACGGGAATAAACAAGAGAGGACAATAGCAAAATGGGGTAAAACTAGAGACTCAAAATTCCTCAAGATGGACAAGGACGAATTCAACGTAATAAACGCTGAAGTAATTAGCAACGAAGTTAAGACGGTTAAGGATTCACAAGATAAGGGAAAGAAGGTTTTATGGGGATAAGGCTTATGATACCAACGAGGCTGGAGTTGAGGTTGTTGTCCCACCTAGGAAGAACGCTTCTACTAAACGCAGTCATCCTGCTAGGCTGTGAGGGAGTTCAAGAAACTTGGCTATAATCGTTGGAGGGAGGAGAAGGGTTATGGTGTTAGGTGGAGGGTTGAGTCCTTGTTTTCTGCTGTTAACTTTTGGGGAGTCTGTTAGGGCTACAAGTTTTTTAAGGCAAGTGGTTGAGGCCAAGTTCTGGGCTTATGCATGGATGGTCCACTTGGCTGTAGTCGATAGGGCTCACGGTATTAGGATGTGAGCTTGAGAATAACGTTGAAATAAATATTAATTACTGAAAAATTCTCC-TTATGTCG-TATCATGCTTATGAAATAAATTGAAGATATCAACAAAGCAAC
s M1613_contigs.M1613_contigs_0     48715 96 + 1741614 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GGGTATTAGGGTGTGAGCTTGCGAATAACGTTGAAATAAATATTAATTACTGAAAGATT-TCCGTTAT-ACGATATCGTTTTAATGAAATAAATTGAA-----------------

However, PopCOGenT will not recognize the mismatch, and just take it as a whole sequence.

https://github.com/philarevalo/PopCOGenT/blob/7296af9957ac03959b70adf41463ffa1c20dd19a/src/PopCOGenT/length_bias_functions.py#L247-L264

Is this an expected feature?

Thanks!