using only the taxonomic assignemnt part - help wanted

limey-bean / Anacapa

Written by Emily Curd (eecurd@g.ucla.edu), Jesse Gomer (jessegomer@gmail.com), Gaurav Kandlikar (gkandlikar@ucla.edu), Zack Gold (zjgold@ucla.edu), Max Ogden (max@maxogden.com), Lenore Pipes (lpipes@berkeley.edu)and Baochen Shi (biosbc@gmail.com). Assistance was provided by Rachel Meyer (rsmeyer@ucla.edu).

MIT License

38 stars 19 forks source link

Hi,

I came across this reading the Lin et al preprint and it sounds very useful! I have metabarcoding data for two marker (COI and 16S) from a very diverse community (I know that the COI sequences contain animals, plants and fungi at least) and I have been struggling with the taxonomic assignemnt.

I had spend quite a bit of time making a refdb with obitools and from the embl database, but I saw that you did all the job allreday, and a better job than I did :-)

An just while I was trying to make my DBs with CRUX, I even saw that you already had pre-maid databases for COI and 16S!

So far - so awesome!

However, I am struggling to find out how to just run the taxonomic assignment bit of the pipeline.

I have ASV tables (dada2 denoised) for both marker genes.

could you give me some pointers?

Thanks!

Fabian

Hi Fabian, Glad you are finding this tool useful! Unfortunately, Anacapa in it's current implementation is very finicky about the location of directories and naming of folders. In my experience it is usually easier to just run everything from start to finish through the pipeline to avoid this issue. It could take you just as long to rename and re format all the dada2 outputs you generated as it would be to just run it through dada2 again. Given you are using dada2 anyways so you could just re-run with the same parameters ported over.

That being said, in order to run the classification step you will need to make a directory structure following the 12S example provided in the database. You will mostly just need to recreate the dada2 output folder (importing your dada2 results). You will also need to make sure that your dada2 output is in the exact same format as the output of the first ASV parsing module (again look at the 12S example for what the output should look like). Once you have recreated the file structure and appropriately named the files to include the marker of interest (that matches the name of the CRUX-DB exactly), you should be able to run the classifier no problem. Hope that helps, and good luck!

limey-bean / Anacapa

using only the taxonomic assignemnt part - help wanted #56