mothur / mothur.github.io

wiki for the mothur software package
https://mothur.github.io
Creative Commons Attribution 4.0 International
19 stars 20 forks source link

Question on: Alignment #111

Closed JoeHansen4444 closed 1 year ago

JoeHansen4444 commented 1 year ago
        Start   End NBases  Ambigs  Polymer NumSeqs
Minimum:    0   0   0   0   1   1
2.5%-tile:  1   332 12  0   3   102345
25%-tile:   1   15422   333 0   4   1023446
Median:     1   15422   335 0   4   2046891
75%-tile:   1   15422   357 0   5   3070336
97.5%-tile: 15616   16273   357 0   6   3991436
Maximum:    16273   16273   384 0   8   4093780
Mean:   655 14695   313 0   4
# of unique seqs:   1386222
total # of seqs:    4093780

I did a summary.seqs after aligning my sequences using the silva reference file. I know it is not recommend but I am looking at the V3V4 region and customized the reference database starting with 9895 and ending with 26167. I was surprised that the alignment summary showed the start was 1 for most tiles and that the 97.5% tile had a start higher than the end of any other tile. After the alignment I did an aling.check to see how well it worked and received this message: your sequences are 16274 long, but your map file only contains 50001 entries. please correct. Could this be because the reference database is off making the alignment off or possibly a few bad sequences that are throwing the rest off? Any suggestions on how to fix this problem would be greatly appreciated.

pschloss commented 1 year ago

The file we use for align.check is based on the full 50000 column alignment so using the trimmed alignment would cause problems.

For screen.seqs you would want to use start=1, end=15422

Pat

JoeHansen4444 commented 1 year ago

Thank you!