sterrettJD / HoMi

Host-microbiome dual transcriptome pipeline
https://homi-pipeline.readthedocs.io
Other
0 stars 0 forks source link

Metaphlan Databases Reinstalling #88

Open jboconnor13 opened 3 weeks ago

jboconnor13 commented 3 weeks ago

When and index has been specified in the config file (i.e. metaphlan_index_name: mpa_vOct22_CHOCOPhlAnSGB_202212) and those files for that version are already installed in the in the metabphlan_bd directory specified in the config (metaphlan_bowtie_db: data/metaphlan_db/), a new metaphlan database is reinstalled each time the workflow is ran. Perhaps the output can be specified in the rule all inputs to resolve this issue?

jboconnor13 commented 3 weeks ago

It is also worth noting this is when the setup_metaphlan rule is adjusted to have the installation done manually in the snakefile as described in https://github.com/sterrettJD/HoMi/issues/86 (see below)

#if [ "{params.index_name}" = "latest" ]; then
#  metaphlan --install --nproc {threads} --bowtie2db {output.loc} {params.extra}

#else
#  metaphlan --install --nproc {threads} --bowtie2db {output.loc} --index {params.index_name} {params.extra}

#fi

# Option to do it manually if --install doesn't seem to work
 cd {output.loc}
# Can specify whatever version you want here
 wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
 tar -xvf mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
 rm mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar`
sterrettJD commented 2 weeks ago

Hey @jboconnor13 what does snakemake say the reason for rerunning is? Is there a certain file missing? Is the code changed?

For example, snakemake should say something like this.

[Mon Nov 4 14:12:15 2024] rule taxa_barplot: input: tutorial.f0.0.r0.0.nonhost.humann/all_bugs_list.tsv, R_packages_installed output: tutorial.f0.0.r0.0.nonhost.humann/Metaphlan_microshades.html jobid: 31 reason: Missing output files: tutorial.f0.0.r0.0.nonhost.humann/Metaphlan_microshades.html resources: mem_mb=10000, mem_mib=9537, disk_mb=1000, disk_mib=954, tmpdir=, partition=short, runtime=120, slurm=

Rscript     -e "rmarkdown::render('/Users/jost9358/miniconda3/envs/HoMi_tutorial/lib/python3.11/site-packages/homi_pipeline/rule_utils/Metaphlan_microshades.Rmd', output_dir='/scratch/Users/jost9358/HoMi_tutorial/tutorial.f0.0.r0.0.nonhost.humann', params=list(bugslist='/scratch/Users/jost9358/HoMi_tutorial/tutorial.f0.0.r0.0.nonhost.humann/all_bugs_list.tsv', metadata='/scratch/Users/jost9358/HoMi_tutorial/tutorial_metadata.csv', directory='/scratch/Users/jost9358/HoMi_tutorial/tutorial.f0.0.r0.0.nonhost.humann'))"

Submitted job 31 with external jobid '9774328'.

What does the reason section say?