Closed astro-noodles closed 1 year ago
Hey @astrobiophile,
Can you please send the first 10 lines of your input matrix?
Here it is: kaiju-lvbr-spades-fixed-matrix.names_first10.txt
Thanks for taking a look, @meren !
Hey @astrobiophile,
When I look at the contents of input file, which is supposed to match the gene-taxonomy-txt artifact, I se this:
I see this:
c_000000000004 | Bacteria | Proteobacteria | Campylobacterales | Epsilonproteobacteria | Sulfurovaceae | Sulfurovum | Sulfurovum sp. |
---|---|---|---|---|---|---|---|
c_000000000017 | Bacteria | Actinobacteria | Corynebacteriales | Actinomycetia | Corynebacteriaceae | Corynebacterium | Corynebacterium sp. 4H37-19 |
c_000000000035 | Viruses | Uroviricota | Caudovirales | Caudoviricetes | Siphoviridae | NA | Microbacterium phage PauloDiaboli |
c_000000000048 | Bacteria | Proteobacteria | Pseudomonadales | Gammaproteobacteria | Marinobacteraceae | Marinobacter | Marinobacter sp. LV10R510-11A |
c_000000000061 | Bacteria | Bacteroidetes | Flavobacteriales | Flavobacteriia | Flavobacteriaceae | Gillisia | Gillisia mitskevichiae |
c_000000000070 | Bacteria | Bacteroidetes | Flavobacteriales | Flavobacteriia | Flavobacteriaceae | Salegentibacter | NA |
c_000000000073 | Bacteria | Proteobacteria | Burkholderiales | Betaproteobacteria | Comamonadaceae | Variovorax | NA |
c_000000000082 | Viruses | Uroviricota | Caudovirales | Caudoviricetes | NA | NA | uncultured Caudovirales phage |
c_000000000097 | Viruses | Uroviricota | Caudovirales | Caudoviricetes | NA | NA | uncultured Caudovirales phage |
This is not the right format as (1) the first column should be gene caller ids, and not contig names, and (2) there should be a header with the following column names:
gene_callers_id | t_domain | t_phylum | t_class | t_order | t_family | t_genus | t_species
All of which are explained in the blog post linked from the artifact page.
You are absolutely right, that was a mistake on my part. I used the wrong fasta file version for taxonomic classification. With the correct file for the classifier, I am able to manually parse the taxonomy effortlessly with the default_parser. Thanks for the explanation!
Hello! I am having trouble with the
default_matrix
taxonomy import parser. I need to manually import my taxonomy from Kaiju (I could not use the Kaiju parser due to the incompatibility of the newer Kaiju version with Anvio). I formatted my Kaiju output and created a matrix file with 8 columns (1 for gene callers ID, and 7 for taxonomy), and used theanvi-import-taxonomy-for-genes -c CONTIGS.db -i input_matrix.txt -p default_matrix
but am not successful at all. I was hoping you can help since I have been stuck for several days now. Here is the error:...
I can provide the input files if needed for inspection.
I am using Windows 10 Pro (sorry), Intel i7, 32gb RAM, and have been successful with every step of A’nvio (up until this point) using the latest pull of the Docker image (Thank you for making it available with almost no fuss with installation as I am not a “regular” bioinformatician and only need to do this metagenome analysis once).