We are currently evaluating replacing clustalo with famsa in the Uniclust/Uniref HHblits database workflow. We aligning nearly 7 million non-singleton clusters of a Uniref clustered to 30% seq.id. with famsa. About 800 MSAs were failing in later stages. After manually looking at a few of those I found that they contained stop codons * and originally Selenocysteine (U) or Pyrrolysine (O). This emits the unknown residue X instead.
The gpu branch of the code also defines this constant, however since I do not have a GPU to test my changes. I did not touch that code.
Alternatively the code could be reworked to also support the three missing residues O, U and J. However, for my purposes, I would prefer to emit X.
We are currently evaluating replacing clustalo with famsa in the Uniclust/Uniref HHblits database workflow. We aligning nearly 7 million non-singleton clusters of a Uniref clustered to 30% seq.id. with famsa. About 800 MSAs were failing in later stages. After manually looking at a few of those I found that they contained stop codons
*
and originally Selenocysteine (U
) or Pyrrolysine (O
). This emits the unknown residueX
instead.The gpu branch of the code also defines this constant, however since I do not have a GPU to test my changes. I did not touch that code.
Alternatively the code could be reworked to also support the three missing residues
O
,U
andJ
. However, for my purposes, I would prefer to emitX
.