Rather than taking in the raw metagenomic reads, now run_dbcan takes in the predicted genes from metaerg. This has the advantage that the annotated CAZyme genes will match genes annotated using other tools.
R1/R2 are aligned to the genes predicted using metaerg. I think this is useful if you want to any sort of annotation-related analysis that goes beyond presence/absence analysis.
deleting aggregate_cazi_results and aggregate_metaerg_results functions. My impression is that these aren't used anywhere, so removing them to reduce confusion. (At first I thought these were defining the inputs for the joining functions, but I don't think it is actually the case).
This works as is now, however I think there are a few things to improve on pre-merging. Will probably just use a combination bwa/samtools docker container rather than having to split the alignment and sorting/compression into different jobs. Also I plan to do aligning on the fragmented meta-erg output rather than the joined files which should speed up that component considerably (currently a bit on the slow side).
Main changes:
run_dbcan
takes in the predicted genes frommetaerg
. This has the advantage that the annotated CAZyme genes will match genes annotated using other tools.aggregate_cazi_results
andaggregate_metaerg_results
functions. My impression is that these aren't used anywhere, so removing them to reduce confusion. (At first I thought these were defining the inputs for the joining functions, but I don't think it is actually the case).This works as is now, however I think there are a few things to improve on pre-merging. Will probably just use a combination bwa/samtools docker container rather than having to split the alignment and sorting/compression into different jobs. Also I plan to do aligning on the fragmented meta-erg output rather than the joined files which should speed up that component considerably (currently a bit on the slow side).