Open flacchy opened 1 month ago
Dear @flacchy,
No worries, I am here to help. :)
I can't see the error message in your pasted section. But what I see is that in the shell part of the rule only the reference folder is provided for minimap2. This is curious, as you state that the full path to the reference genome is set in the config file. I also noticed that in the shell part the path says /scratch/xxx/zzz/
while in the config /scratch/xxx/yyy/
. Is your problem that the zzz and yyy parts are not matching?
The path to the reference genome is directly taken from the congif.yaml RefGenome
key by the highlighted line in the map rule:
RefPath = config["RefPath"]
rule map: input: MapIn + "{sample}.fastq.gz" output: tmp_dir + metaPath + "/1_mapping/{sample}.bam" params: ref = RefPath threads: config["ThreadNr"] conda: "../../envs/map_ONT_env.yaml" benchmark: config["OutPath"] + "/benchmark/" + ProjDirName + "/1_mapping/{sample}.tsv" log: config["OutPath"] + "/logs/" + ProjDirName + "/1_mapping/{sample}.log" shell: """ minimap2 -ax map-ont {params.ref} \ {input} \ -t {threads} \ 2> {log} | samtools sort \ -@ {threads} \ -o {output} 2>> {log} """
My first suggestion would be to check if you are using the right config, file if you created multiple.
thanks @moldovannorbert , the difference here" /scratch/xxx/zzz/ while in the config /scratch/xxx/yyy/" is just me changing the paths as this is a public space and just thought to changes it , but can see it can be confusing. this is my config file:
configcopy.txt Please note: the file is actually config.yaml , I had to save it as txt to upload it here
Dear @flacchy,
The config file looks right. Can you also share the ONT_dummy_sub-100000_34.fastq.gz.log
if that contains anything?
[M::mm_idx_gen::48.889*1.60] collected minimizers
[M::mm_idx_gen::53.803*2.13] sorted minimizers
[M::main::53.803*2.13] loaded/built the index for 194 target sequence(s)
[M::mm_mapopt_update::57.073*2.07] mid_occ = 706; max_occ = 10865
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 194
[M::mm_idx_stat::57.844*2.05] distinct minimizers: 100159079 (38.75% are singletons); average occurrences: 5.545; average spacing: 5.581
[M::worker_pipeline::63.129*2.40] mapped 100000 sequences
[M::main] Version: 2.1.1-r341
[M::main] CMD: minimap2 -ax map-ont -t 8 /scratch/prj/hab/Human_genome_reference/GRCh38.primary_assembly.genome.fa.gz /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_trimming/ONT_dummy_sub-100000_34.fastq.gz.fastq.gz
[M::main] Real time: 63.373 sec; CPU: 151.512 sec
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
Dear @flacchy I don't see any errors here either. Can you please specify what is the error you encounter?
running mapping.smk doesn't finish the run. It crashes giving multiple errors of missing files . As mentioned above I don't know if it is because the new version of snakemake works differently. Not sure if this can help but here are some more info from full logs for one of the multiple failing runs :
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 500
Conda environments: ignored
Job stats:
job count
--------------- -------
NanoPlot 10
all_mapping 1
flagstat 10
map 7
mark_duplicates 10
multiqc_mapping 1
total 39
Select jobs to execute...
Execute 16 jobs...
[Tue Jul 30 10:22:58 2024]
rule map:
input: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_trimming/ONT_dummy_sub-100000_41.fastq.gz.fastq.gz
output: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_41.fastq.gz.bam
log: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_41.fastq.gz.log
jobid: 15
benchmark: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/benchmark/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_41.fastq.gz.tsv
reason: Missing output files: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_41.fastq.gz.bam
wildcards: sample=ONT_dummy_sub-100000_41.fastq.gz
resources: tmpdir=<TBD>
minimap2 -ax map-ont /scratch/prj/hab/Human_genome_reference /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/
1_trimming/ONT_dummy_sub-100000_41.fastq.gz.fastq.gz -t 1 2> /scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test
_demo/trimmed/1_mapping/ONT_dummy_sub-100000_41.fastq.gz.log |
samtools sort -@ 1 -o /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_mapping
/ONT_dummy_sub-100000_41.fastq.gz.bam 2>> /scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_41.fastq.gz.log
[Tue Jul 30 10:22:58 2024]
........ more logs and errors .... showing only end of file now ....
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job mark_duplicates since they might be corrupted:
/scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_34.fastq.gz.bam
Removing output files of failed job mark_duplicates since they might be corrupted:
/scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_33.fastq.gz.bam
Removing output files of failed job mark_duplicates since they might be corrupted:
/scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo/trimmed/1_mapping/ONT_dummy_sub-100000_38.fastq.gz.bam
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-07-30T102258.138724.snakemake.log
WorkflowError:
At least one job did not complete successfully.
it seems that snakemake is submitting all jobs at once rather than starting the next part and starting for example samtools or nano-plot only after minimap have finishes. I am trying to downgrade snakemake to see if the deprecated version where --cluster works will make a difference .
One thing that might cause an error (not sure if this is the cause here) is that you are useing file names with the .fastq.gz
extension as sample names. Can you check this first? In the sample sheet please use only the sample name (eg. ONT_dummy_sub-100000_34).
If this doesn't solve the issue, can you share the .snakemake/log/2024-07-30T102258.138724.snakemake.log
if it's not empty?
thanks again for your help @moldovannorbert .
ok, I am just re-running mapping.smk. Just to re-check , here is the sample sheet
Trimming output
command to run mapping.smk
snakemake --printshellcmds --keep-going --executor cluster-generic --cluster-generic-submit-cmd 'sbatch --time=60 --nodes=1 --partition=cpu --cpus-per-task=32 --mem=64000 --output=/scratch/prj/hab/Flavia/DEMO_ITSFASTR/slurm-%j.out' --jobs 500 --max-jobs-per-second 3 --max-status-checks-per-second 120 -s mapping.smk
Error in red : while running
the text:
Error in rule NanoPlot:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 8
input: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo_versionAugust/trimmed/1_mapping/ONT_dummy_sub-100000_37.bam
output: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo_versionAugust/trimmed/1_mapping_quality/ONT_dummy_sub-100000_37
log: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo_versionAugust/trimmed/1_mapping_quality/ONT_dummy_sub-100000_37.log (check log file(s) for error details)
conda-env: /scratch/prj/hab/Flavia/ITSFASTR/workflow/rules/1_preprocessing/.snakemake/conda/80b2dd96334bda752de9987b745c4ed7_
shell:
NanoPlot --bam /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo_versionAugust/trimmed/1_mapping/ONT_dummy_sub-100000_37.bam -o /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo_versionAugust/trimmed/1_mapping_quality/ONT_dummy_sub-100000_37 --raw --alength -t 1 --huge 2>> /scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo_versionAugust/trimmed/1_mapping_quality/ONT_dummy_sub-100000_37.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: Submitted batch job 19763414
....more text here ...
[Tue Aug 6 10:49:34 2024]
Finished job 4.
17 of 42 steps (40%) done
[Tue Aug 6 10:49:55 2024]
Finished job 12.
18 of 42 steps (43%) done
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-08-06T104523.456938.snakemake.log
WorkflowError:
At least one job did not complete successfully.
Full log error: attached here 2024-08-06T104523.456938.snakemake.log
thanks again for your help
Ok, this is getting us closer. You are getting errors in the mark_duplicates
and flagstat
rules. This suggests that there is something wrong with the mapping.
/scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo_versionAugust/trimmed/2_mark_duplicates/ONT_dummy_sub-100000_37.log
and the log for the mapping for the same file?so the
/scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo_versionAugust/trimmed/2_mark_duplicates/ONT_dummy_sub-100000_37.log
is empty
the mapping log here : /scratch/prj/hab/Flavia/DEMO_ITSFASTR/logs/Test_demo_versionAugust/trimmed/1_mapping/ONT_dummy_sub-100000_37.log
has only these lines
[M::mm_idx_gen::77.353*0.99] collected minimizers
[M::mm_idx_gen::112.680*0.99] sorted minimizers
[M::main::112.680*0.99] loaded/built the index for 194 target sequence(s)
[M::mm_mapopt_update::115.404*0.99] mid_occ = 706; max_occ = 10865
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 194
[M::mm_idx_stat::116.161*0.99] distinct minimizers: 100159079 (38.75% are singletons); average occurrences: 5.545; average spacing: 5.581
[M::worker_pipeline::145.938*0.99] mapped 100000 sequences
[M::main] Version: 2.1.1-r341
[M::main] CMD: minimap2 -ax map-ont -t 1 /scratch/prj/hab/Human_genome_reference/GRCh38.primary_assembly.genome.fa.gz /scratch/prj/hab/Flavia/DEMO_ITSFASTR/Test_demo_versionAugust/trimmed/1_trimming/ONT_dummy_sub-100000_37.fastq.gz
[M::main] Real time: 146.186 sec; CPU: 144.820 sec
It seems that only these were created
In the benchmarking folder I can see the following for mapping:
And for the dummy sample 37: /scratch/prj/hab/Flavia/DEMO_ITSFASTR/benchmark/Test_demo_versionAugust/trimmed/1_mapping/ONT_dummy_sub-100000_37.tsv
Dear @flacchy,
Sorry for the late response. I looked through the logs you sent, but unfortunately I could not figure out why some jobs fail. I was also unable to reproduce it on our cluster. Can you please send me what's in one of the failed slurm-%j.out logs?
Hello @moldovannorbert really sorry for disturbing again, I am running the demo and managed to run the
trimmed.smk
but I am encountering various errors when runningmapping.smk
.Specifically I am running the following code:
NOTE: the --cluster option is deprecated in snakemake so I have installed the plugin and change the parameters so that it could run
the error was mentioning latency so I have change the parameter from
--max-status-checks-per-second 5
to--max-status-checks-per-second 120
still error:
on my config file the path to ref genome is
any suggestion on how this is happening and how to fix it?