nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
218 stars 111 forks source link

undefined parameter `host_genomes` with -r 3.2.0 and -r 3.2.1 #715

Open Thomieh73 opened 6 days ago

Thomieh73 commented 6 days ago

Hi, I tried today to start a run with the MAG version 3.2.1 but I can an error which is unexpected.

this is the error:

WARN: Access to undefined parameter `genome` -- Initialise it to a default value eg. `params.genome = some_value`
WARN: Access to undefined parameter `host_genomes` -- Initialise it to a default value eg. `params.host_genomes = some_value`
ERROR ~ Cannot invoke method keySet() on null object

 -- Check script '/cluster/home/thhaverk/.nextflow/assets/nf-core/mag/./subworkflows/local/utils_nfcore_mag_pipeline/main.nf' at line: 353 or see '.nextflow.log' file for more details

My command is

nextflow run nf-core/mag -r 3.2.1 -profile apptainer -work-dir $USERWORK/nf_mag -resume -params-file params.json

my params.json file is

{
        "input": "samplesheet_large.csv",
        "outdir": "\/cluster\/projects\/nn10070k\/projects\/phagedrive\/pd_data_control\/results\/20241125_MAG_results",
        "multiqc_title": "MQC_FULL_SAMPLE_RUN",
        "reads_minlength": 50,
        "igenomes_base" : "s3://ngi-igenomes/igenomes",
        "gtdb_db": "\/cluster\/projects\/nn10070k\/databases\/gtdbtk_r220_data.tar.gz", 
        "host_genome": "GRCh38",
        "kraken2_db": "\/cluster\/projects\/nn10070k\/databases\/kraken2_pluspfp_05.06.2024\/hash.k2d",
        "cat_db": "\/cluster\/projects\/nn10070k\/databases\/20240422_CAT_nr",
        "binqc_tool": "checkm",
        "checkm_db": "\/cluster\/projects\/nn10070k\/databases\/checkm_db_2015.01.16",
        "busco_db": "\/cluster\/shared\/biobases\/BUSCO\/2024-11-21",
        "busco_auto_lineage_prok": true,
        "busco_clean": true,
        "run_gunc": true,
        "gunc_db": "\/cluster\/projects\/nn10070k\/databases\/gunc_db_progenomes2.1\/gunc_db_progenomes2.1.dmnd",
        "refine_bins_dastool": true,
        "postbinning_input": "refined_bins_only",
        "run_virus_identification": true    
    }

I do not see how this error happens. I tried it as well with version 3.2.0. same error.

I can start my pipeline with version 3.1.0, than the error does not occure.

So why does it not use the indicated "host_genome" from my json file?

jfy133 commented 3 days ago

Oof, there a few things in here.

The error keySet() message comes from a bug in the

/cluster/home/thhaverk/.nextflow/assets/nf-core/mag/./subworkflows/local/utils_nfcore_mag_pipeline/main.nf' at line: 353

bit, because the variable name should be ${params.host_genome.keySet()} but currently is ${params.host_genomes.keySet()...}>

Why it's not picking up your genome I don't now exactly and will require more investigation.

igenomes has not been particularly popular outside model organism reliant pipelines, so it might be the simply that the implementation in here is possible broken due to lack of use (also looking at the code older developers didn't use the original template which has made it even more broken as it's diverged over time I gues...)... I would recommend switching instead to supplying your own FASTA file in the meantime if possible.

My kid is sick again (kindergarten...) so I'm not sure when I will get to this. I will also need to set up iGenomes on my system I assume as well so it'll take time...