Note that current version does not include search of very large metagenome data. For some proteins, metagenome data is important. We will update this as soon as possible.
I am confused about how your code calculates Meff in the a3m format when 'gaps aligned to inserts' are omitted. Specifically, it appears that the code treats matches (uppercase characters) and inserts (lowercase characters) in the same manner, and this results in a higher Meff value for the file.
To illustrate the issue, consider the following example using two sequences in the a3m format:
In position 10 of the second sequence, there is a lowercase 'g'. Not adding a 'gap aligned to insert' in the corresponding position of the first sequence causes all subsequent residues to shift to the right and this results in considering these shifted residues as dissimilar, which are in fact the same. As a result, the number of dissimilarities increases, leading to an inflated Meff value for the MSA file.
Could you kindly explain the rationale behind this?
Hi!
I am confused about how your code calculates Meff in the a3m format when 'gaps aligned to inserts' are omitted. Specifically, it appears that the code treats matches (uppercase characters) and inserts (lowercase characters) in the same manner, and this results in a higher Meff value for the file.
To illustrate the issue, consider the following example using two sequences in the a3m format:
In position 10 of the second sequence, there is a lowercase 'g'. Not adding a 'gap aligned to insert' in the corresponding position of the first sequence causes all subsequent residues to shift to the right and this results in considering these shifted residues as dissimilar, which are in fact the same. As a result, the number of dissimilarities increases, leading to an inflated Meff value for the MSA file.
Could you kindly explain the rationale behind this?