mhoban / rainbow_bridge

GNU General Public License v3.0
5 stars 2 forks source link

zotu fasta and insect contain more zotus than zotu table #85

Open cajwalsh opened 4 weeks ago

cajwalsh commented 4 weeks ago

I noticed recently that the zotu table was missing entries/rows for zOTUs that were classified by insect. They also appear in the fasta file. E.g. the fasta contains 10675 zOTUs but the table only has 10621. Many of these were unclassified/blank or unknown eukaryotes (according to insect), but many were identified to at least order and some to family or to species.

I also noticed that there were zOTUs that had fewer than the minimum abundance specified for vsearch (e.g. 8 is the default but I had many below 8 read abundance across the whole dataset).

This made me suspect that this is coming from after the derep and clustering step of the vsearch process in either the uchime or remap/table creation step.

I don't think this is new as I'm pretty sure I remember looking into this before but forgetting, but also as you mentioned it doesn't seem likely to be something with the pipeline so much as vsearch doing things we don't expect.

cajwalsh commented 3 weeks ago

related to this process as well, I have been meaning to say that I am pretty sure that "all_cpus" for the derep processes is using all cores on the computer rather than all cores given to the run to use. I assume this is not the desired functionality but if so no worries.

mhoban commented 3 weeks ago

related to this process as well, I have been meaning to say that I am pretty sure that "all_cpus" for the derep processes is using all cores on the computer rather than all cores given to the run to use. I assume this is not the desired functionality but if so no worries.

Oh yeah, this is something I've known about and meant to fix but keep forgetting. I'll make a separate issue for it.