Closed mradz19 closed 1 year ago
Hello The shortening of the names, yes, it is because white spaces. Diamond formats its databases in that way, with blank spaces meaning end of header. So you better replace white spaces with underscore symbols. Regarding the annotations, they are not duplicated. Second column is the classification based on the best hit, third is based on the best average. That's why sometimes there is a best hit but not a best average. Please refer to the manual for details on how this works. Best, J
I am attempting to use the non-redunandant protein database from Refseq as an external database but i am having some issues. I have formatted my reference FASTA file to this format:
However when doing the run I see that the output following DIAMOND in step 7 (07.test_run.fun3.Refseq) looks like this:
I can see that words are duplicated in some cases the functional description has been reduced to a few letters. Is this because of the spaces in the functional descriptions of the fasta file?