should there be no difference between soft shielding and unshielded miniprot?
if hard masking is used, some CDs sequences extracted from GFF files contain about 10 to 100 N, how should I deal with these CDs sequences containing N?
how should I screen the extracted CDs sequences whether or not the repetitive sequences are shielded? For example, if the protein length is less than 50 amino acids, discard all or other standards?
The following is a CdS sequence extracted from the GFF file by miniprot annotation using proteins of the same species. The length of the protein translated by seqkit and gffreed also made me a little confused?
Repeat masking will affect the alignment of some proteins as you showed. I don't know whether that is a positive or negative effect overall. You have to do a research by yourself.
Hi, Professor Li, excuse me