Closed hoelzer closed 4 years ago
Yes, definietly. Mmseqs comes along with e-value and min-aln-len parameters anyway. The user should be able to handover custom values for this and if not is informed about the default in the help message (those are evalue = 0.001 and aln-len=0). Percent Identity can be filtered afterwards from the table
ah nice, yeah evalue and aln-len are already good params and good to have them directly accessible in mmseqs2.
I think we should at least do some basic filtering of the mmseqs2 results. I think the output is blast-like?
I suggest that we filter for
When you can implement this, maybe do some quick test with ident 80% and aln-length 60% / 80% and compare the number of assigned functional annotations.
What we want to achieve with this simple filter is to mainly avoid annotations that are just based on a partial hit of few nucleotides (e.g. we have a hypo ORF of 100 nt and only find a hit of length 10 nt)