Open jboconnor13 opened 3 weeks ago
It is also worth noting this is when the setup_metaphlan rule is adjusted to have the installation done manually in the snakefile as described in https://github.com/sterrettJD/HoMi/issues/86 (see below)
#if [ "{params.index_name}" = "latest" ]; then
# metaphlan --install --nproc {threads} --bowtie2db {output.loc} {params.extra}
#else
# metaphlan --install --nproc {threads} --bowtie2db {output.loc} --index {params.index_name} {params.extra}
#fi
# Option to do it manually if --install doesn't seem to work
cd {output.loc}
# Can specify whatever version you want here
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
tar -xvf mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
rm mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar`
Hey @jboconnor13 what does snakemake say the reason for rerunning is? Is there a certain file missing? Is the code changed?
For example, snakemake should say something like this.
[Mon Nov 4 14:12:15 2024] rule taxa_barplot: input: tutorial.f0.0.r0.0.nonhost.humann/all_bugs_list.tsv, R_packages_installed output: tutorial.f0.0.r0.0.nonhost.humann/Metaphlan_microshades.html jobid: 31 reason: Missing output files: tutorial.f0.0.r0.0.nonhost.humann/Metaphlan_microshades.html resources: mem_mb=10000, mem_mib=9537, disk_mb=1000, disk_mib=954, tmpdir=
, partition=short, runtime=120, slurm= Rscript -e "rmarkdown::render('/Users/jost9358/miniconda3/envs/HoMi_tutorial/lib/python3.11/site-packages/homi_pipeline/rule_utils/Metaphlan_microshades.Rmd', output_dir='/scratch/Users/jost9358/HoMi_tutorial/tutorial.f0.0.r0.0.nonhost.humann', params=list(bugslist='/scratch/Users/jost9358/HoMi_tutorial/tutorial.f0.0.r0.0.nonhost.humann/all_bugs_list.tsv', metadata='/scratch/Users/jost9358/HoMi_tutorial/tutorial_metadata.csv', directory='/scratch/Users/jost9358/HoMi_tutorial/tutorial.f0.0.r0.0.nonhost.humann'))"
Submitted job 31 with external jobid '9774328'.
What does the reason
section say?
When and index has been specified in the config file (i.e.
metaphlan_index_name: mpa_vOct22_CHOCOPhlAnSGB_202212
) and those files for that version are already installed in the in the metabphlan_bd directory specified in the config (metaphlan_bowtie_db: data/metaphlan_db/
), a new metaphlan database is reinstalled each time the workflow is ran. Perhaps the output can be specified in the rule all inputs to resolve this issue?