Closed gauravsk closed 6 years ago
That is a very good question @gauravsk. I agree that the lack of genus, particularly if it is in the "genus species" is a huge problem. We can add taxize into Crux, but it would take some work. Because we are filtering reads by version accession number, we could just grab "genus species" with enter_qiime.py and then run that file through taxize (unless taxize accepts accession version numbers). We would then just need to grab super kingdom, phylum, class, order, family, genus and species. We don't have any R in the CRUX scripts but it could be fun!
agreed that doing this in R rather than CRUX might be the way to go. It looks like taxize might work with accession numbers but not sure. https://gist.github.com/sckott/a78e11dc624dd4342173#pass-the-uid-along-to-other-functions
That looks super promising. It certainly works with accession numbers, we can check if it works with version accession numbers. We could do this in place of enterz_qiime.py in crux. Is there an easy way to read in the fasta file, strip the accession (or accession version number), run it through taxize and pullout kingdom, phylum, class, order, family, genus and species, and then make a txt file that matches the current taxonomy file output?
Yeah, that should be doable- tbh not sure what is the best place to integrate it in. I'm not as familiar with the post-dada2 steps of Anacapa as I should be, maybe there's a way to integrate it in over there. Let's talk about it.
Well, it is a CRUX database problem for sure... See line 90 of the third CRUX script. If you had a pretty R script, we could drop it in there... I am around if you wanna chat. https://github.com/limey-bean/CRUX_Creating-Reference-libraries-Using-eXisting-tools/blob/master/crux_release_V1_db/crux_part3.sh
Ok, this is not a CRUX problem @jessegomer we have some BLCA stuff to check out...
I'm confused by the behavior on unknown taxonomy:
"NA;NA;NA;NA;NA;NA"
and just""
insum.taxonomy
?Arthropoda;Insecta;Anthoathecata;Hydractiniidae;NA;Podocoryna carnea
hard to interpret as a user- and also when doing biom comparison stuff. Why isn't Podocoryna being listed as genus? We may find the r packagetaxize
useful:@jessegomer @limey-bean