Closed TimothyStephens closed 3 months ago
easy-taxonomy
does already what you want
here is the output of one contig of tax_lca.tsv
with an added header:
CONTIG_NAME TAX_ID TAX_RANK NAME TOTAL_FRAGS ASSIGNED_FRAGS FRAGS_AGREEMENT AGREEMENT_RATIO
query 3702 species Arabidopsis thaliana 2 2 2 1.000
From what I understand, the last few numbers is what you want right?
So, your results are exactly what I am looking for, however, when I run mmseqs easy-taxonomy bin.1.fa NR bin.1.tax tmp --threads 24
, I get the following output.
$ head -n 4 bin.1.tax_lca.tsv
scaffold00022005 2590670 species Aestuariibacter sp. GS-14
scaffold00022061 1046 family Chromatiaceae
scaffold00022138 2590670 species Aestuariibacter sp. GS-14
scaffold00022216 2590670 species Aestuariibacter sp. GS-14
For some reason, the 'FRAGS' columns are missing from my results file.
Can you post the first few lines of bin.1.fa
please?
>scaffold00000068
CTGACGATCAGGCCGGCCGGATCATCGGCGCCGCGGTTGATCCGCAGGCCGGTCGCGAGG
CGCTGGAGGCGGGTGGAGAGGTCGGTGTTCGAGCGGTTCAGGTTGTTCTGCGCAATGAGC
GACGGAACATTCGTATTGATGCGAGCCATGGCAATCCTCCTTGATTGAGAGGCCCTTCGC
GGGAGCAGTCGTCAGGCGGTCTCGCGCCTGGAAGTCGGATTCGCCGCAAGTCGCGCGGCT
GCAGGCAGGACCGTCCCGCCCGGAGTGGCCTTCGCGCCGCGCGAACTCCCGCGCGGGCGG
AGTGAACCAAGCTCGCCACGGAACGCCGCGGCCTGCTGGTTCTCGTTCCTGATGGCGTCG
TACACCTCCTTGCGGTGCACCGCCACGGTTGCCGGCGCCTTGATGCCGATTCGCACCTTG
TCGCCGCGGATATCGACGATCGTGAGTTCGACGTCGTCGCCGATCATGATCGTCTCGTCG
ATCTGTCGTGAGAGCACCAGCATGTGCGGGCTCCTTTCTTGCGGGCATCCATGCCCGGTG
I just looked up when the commit you listed (113e3212c137d026e297c7540e1fcd039f6812b1
) is is from and its ancient. please update to the latest release and everything should work fine.
Updating to the most recent version fixed the issue. My apologies, I didn't realize how old my version had gotten.
Not sure if this is a bug or if I am missing a flag that would make this all work as expected.
Expected Behavior
I wish to taxonomically annotate contigs using the
mmseqs easy-taxonomy
workflow. I see from your documentation (https://github.com/soedinglab/MMseqs2/wiki#taxonomy-output-and-tsv) that it is possible to calculate the LCA of a contig predicted ORFs. With the output file produced listing the contig_name along with the total number of predicted ORFs and the number of those ORFs with top hits that agree with the assigned LCA of the contig.Current Behavior
When I run the following command:
I get the following results files:
None of which contain the expected output described in the documentation.
I have had a look at using
aggregatetax
command, but run into problem with thecreatetsv
command not reassigning the contig names correctly.Your Environment
Thanks for your help in advance.
Cheers, Tim.