mhoban / rainbow_bridge

GNU General Public License v3.0
5 stars 2 forks source link

--standalone-taxonomy behavior with --zotu-table, finalize, and --insect #69

Open cajwalsh opened 4 months ago

cajwalsh commented 4 months ago

I just wanted to make note of a few things that might not behave as expected with --standalone-taxonomy plus possibly --collapse-taxonomy and --insect. I did a --standalone-taxonomy run that requires both the --blast-file and --zotu-table but the resulting output did not merge the --zotu-table file and provided only the taxonomy columns. I have noticed this is the typical behaviour even of longer/normal runs of the --collapse-taxonomy process (even though the script for that does appear to still have the last part of it dedicated to merging the zotu-table if provided and "desired"). In any case the finalize process and script seem to do it for the results in the final/ output directory, but this did not run when doing only --standalone-taxonomy.

With this --standalone-process, it is possible to use OTUs/ASVs/whatevers made by other programs that are not zotus. This run I did used ASVs from mothur output. Therefore I think it might be better to make a different header name for the typical "zotu" collapsed taxonomy column when standalone taxonomy is used (and also possibly change the argument name for --zotu-table). Maybe just OTU or ASV or seq or "representative" or something general.

One last thought I had (that I generally know less about so I apologize if it is dumb) is about whether it would be possible/useful/easy to allow --insect to be run from --standalone-taxonomy runs. As I mentioned in my case, I had an ASV read count table and blast result file from mothur, so adding more taxonomy assignment possibilities from --standalone-taxonomy might be useful too.

mhoban commented 3 weeks ago

I could have sworn I wrote a whole response to this just the other day, but either that was a dream or it didn't stick.

I did a --standalone-taxonomy run that requires both the --blast-file and --zotu-table but the resulting output did not merge the --zotu-table file and provided only the taxonomy columns.

This should be resolved as of #114. Since that update, the standalone taxonomy process does not require a zotu table (it's optional). From the PR: "The taxonomy table (sans zotu abundances) will now end up in output/standalone_taxonomy/lca/<settings>. If you provide a zotu table, it will be merged with the LCA output and put in output/standalone_taxonomy/final"

Therefore I think it might be better to make a different header name for the typical "zotu" collapsed taxonomy column when standalone taxonomy is used (and also possibly change the argument name for --zotu-table)

This should work out of the box, since if you're running standalone-taxonomy, the taxonomy table that's generated has a column called zotu and in the internals of finalize.R (which now joins the taxonomy and zotu tables together) the first column of the zotu table is renamed to zotu (regardless of what it was), so they should always match. I'm not opposed to changing the command line option to --otu-table or --abundance-table or some such thing to make it more generic.

...whether it would be possible/useful/easy to allow --insect to be run from --standalone-taxonomy runs

I see no reason why this couldn't be implemented. I'll create a separate issue for it. I might rework the --standalone-taxonomy option to accept an argument lca and/or insect that tells it which process(es) to run. If you want it to run both, give the option a list argument in a params file.