nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
129 stars 78 forks source link

DSL1 Pipeline completed with errors / bug in test #1027

Closed 0victormh0 closed 5 months ago

0victormh0 commented 9 months ago

Hi everyone, Im trying to get started with eager and I found this problem, in fact Im brand new with nf-core pipelines and I dont know if this issue is easily fixable. I recently installed nextflow and all the requirements according to the docs and when I run the test this is the message I find. I hope you can tell me what is going on and if I can do something to fix this.

Thank you so much for your time and developing such a great tool

my run: 'nextflow run nf-core/eager -profile test,docker' my output:

[08/508a15] NOTE: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134) -- Execution is retried (1) [05/e65499] NOTE: Process fastqc (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (1) [7c/5c74d2] NOTE: Process fastqc (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (1) [35/1682a1] NOTE: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134) -- Execution is retried (2) [ea/883426] NOTE: Process fastqc_after_clipping (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (1) [91/75ee6d] NOTE: Process fastqc (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (2) [e8/4825bf] NOTE: Process fastqc_after_clipping (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (1) [92/8406d7] NOTE: Process fastqc (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (2) [27/25597c] NOTE: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134) -- Execution is retried (3) [f5/e592ab] NOTE: Process fastqc_after_clipping (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (2) [8e/6c25a4] NOTE: Process fastqc (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (3) [24/d78cdf] NOTE: Process fastqc_after_clipping (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (2) Error executing process > 'markduplicates (JK2802)'

Caused by: Process markduplicates (JK2802) terminated with an error exit status (134)

Command executed:

mv JK2802_SE.mapped.bam JK2802.bam picard -Xmx4096M MarkDuplicates INPUT=JK2802.bam OUTPUT=JK2802_rmdup.bam REMOVE_DUPLICATES=TRUE AS=TRUE METRICS_FILE="JK2802_rmdup.metrics" VALIDATION_STRINGENCY=SILENT samtools index JK2802_rmdup.bam

Command exit status: 134

Command output: (empty)

Command error: /opt/conda/envs/nf-core-eager-2.4.7/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2023-10-19 13:22:07 MarkDuplicates

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** MarkDuplicates -INPUT JK2802.bam -OUTPUT JK2802_rmdup.bam -REMOVE_DUPLICATES TRUE -AS TRUE -METRICS_FILE JK2802_rmdup.metrics -VALIDATION_STRINGENCY SILENT


library initialization failed - unable to allocate file descriptor table - out of memory/opt/conda/envs/nf-core-eager-2.4.7/bin/picard: line 66: 46 Aborted (core dumped) /opt/conda/envs/nf-core-eager-2.4.7/bin/java -Xmx4096M -jar /opt/conda/envs/nf-core-eager-2.4.7/share/picard-2.26.0-0/picard.jar MarkDuplicates "INPUT=JK2802.bam" "OUTPUT=JK2802_rmdup.bam" "REMOVE_DUPLICATES=TRUE" "AS=TRUE" "METRICS_FILE=JK2802_rmdup.metrics" "VALIDATION_STRINGENCY=SILENT"

TCLamnidis commented 8 months ago

Hi @0victormh0!

I cannot reproduce this error on my end with the command nextflow run nf-core/eager -r 2.4.7 -profile test,docker (with Nextflow 22.10.6)

The exit codes you get seem to suggest the jobs were aborted:

Exit code 134 means that the program was aborted by a SIGABRT signal.

But the error message in the picard job suggests it ran out of memory.

library initialization failed - unable to allocate file descriptor table - out of memory

Please make sure that your docker is allowed to use enough memory when spinning up containers.

That said, it is surprising that the test would run out of memory. If bumping up the memory of containers a bit (6-8GB should be more than enough), could you provide some more information on your setup? Are you running this on a laptop, or an HPC? Are you loading any other configurations?

0victormh0 commented 8 months ago

Hi @TCLamnidis thanks for your reply, Im running the pipeline in a laptop with 16GB RAM and 12 × AMD Ryzen 5 PRO. I didn't have any problem with other metagenomic tools (e.g qiime2). This laptop is set with Fedora 37 and the nextflow 22.10.6 installation was performed into a conda environment.

About docker, one administrator of my institute was the person who did the installation (because I don´t own this laptop) and I don´t really know how to check the available memory for the containers. If it is recommended, I would try to install this pipeline in our institutional cluster (I would need to ask for permissions), that has near of 125GB RAM but first I maybe should figure out if it is a resource availability problem.

Thank you so much for your time, Víctor

jfy133 commented 8 months ago

Hi @0victormh0

As @TCLamnidis this is an out of memory issue. You can't really compare memory requirements between tools, each need their own amount depending on what they are doing.

If I remember correctly tools such as MarkDuplicates require quite a lot of memory (I believe I saw once some general recommendation of assigning at least 32GB) to it.

One thing you can try is not running your own data, but just the test profile -profile docker,test --outdir <....>, and see if that runs through.

You can also let us know in the meantime 1) how much data you have [how many reads] and 2) if you already have enough idea of the endogenous/ontarget percentage of the library. It could be you maybe have too much data?

0victormh0 commented 8 months ago

Hi @jfy133, Sorry, I wasn´t clear enough in my previous message, all this errors comes from the test data, I have never tried with my data yet. I ran the command: "nextflow run nf-core/eager -profile test,docker". By the way, I only have two samples with about 16M reads each. In case the RAM requirements is near of 32GB RAM, I would try to install and run the pipeline in my institute's hpc.

Thank you so much for your time. Víctor

jfy133 commented 8 months ago

Oh that's strange... in the long run yes, running nextflow pipelines on clusters make sense, but if even test isn't working that's a problem.

Could you try making a file called custom.config that has the following

process {
  withName: markduplicates {
        memory = 8.GB
    }
}

and then run the command as

nextflow run nf-core/eager -profile test,docker -c custom.config
0victormh0 commented 8 months ago

Sorry @jfy133 I find the same error :( I don´t know if i made correctly the custom.config file, I just copied the command on a nano text editor and saved with that name. Then I ran the pipeline command in the working directory where the custom.config file was. As it didn´t work I tried to repeat the command with the absolute path to the file (just in case).

jfy133 commented 8 months ago

Can you post the error?

0victormh0 commented 8 months ago

Sure, here I post all the verbose:

executor > local (22) [89/1a082f] process > makeBWAIndex (Mammoth_MT_Krause.fasta) [100%] 1 of 1 ✔ [9c/33b379] process > makeFastaIndex (Mammoth_MT_Krause.fasta) [100%] 1 of 1 ✔ [79/b37061] process > makeSeqDict (Mammoth_MT_Krause.fasta) [100%] 4 of 4, failed: 4, retries: 3 ✘ [- ] process > convertBam - [- ] process > indexinputbam - [cc/df2fa7] process > fastqc (JK2782_L1) [ 66%] 4 of 6, failed: 4, retries: 4 [- ] process > fastp - [e4/ae63cc] process > adapter_removal (JK2782_L1) [100%] 2 of 2 ✔ [- ] process > post_ar_fastq_trimming - [- ] process > lanemerge - [- ] process > lanemerge_hostremoval_fastq - [fb/86c47d] process > fastqc_after_clipping (JK2782_L1) [ 50%] 2 of 4, failed: 2, retries: 2 [67/109dd3] process > bwa (JK2802) [100%] 2 of 2 ✔ [- ] process > bwamem - [- ] process > circulargenerator - [- ] process > circularmapper - [- ] process > bowtie2 - [- ] process > hostremoval_input_fastq - [- ] process > seqtype_merge - [ac/8c3d43] process > samtools_flagstat (JK2782) [ 0%] 0 of 2 [- ] process > samtools_filter - [- ] process > samtools_flagstat_after_filter - [- ] process > endorSpy - [- ] process > dedup - [- ] process > markduplicates [ 0%] 0 of 2 [- ] process > library_merge - [- ] process > preseq [ 0%] 0 of 2 [- ] process > bedtools - [- ] process > damageprofiler - [- ] process > mapdamage_rescaling - [- ] process > mask_reference_for_pmdtools - [- ] process > pmdtools - [- ] process > bam_trim - [- ] process > additional_library_merge - [- ] process > qualimap - [- ] process > picard_addorreplacereadgroups - [- ] process > genotyping_ug - [- ] process > genotyping_hc - [- ] process > genotyping_freebayes - [- ] process > genotyping_pileupcaller - [- ] process > eigenstrat_snp_coverage - [- ] process > genotyping_angsd - [- ] process > bcftools_stats - [- ] process > vcf2genome - [- ] process > multivcfanalyzer - [- ] process > mtnucratio - [- ] process > sexdeterrmine_prep - [- ] process > sexdeterrmine - [- ] process > nuclear_contamination - [- ] process > print_nuclear_contamination - [- ] process > metagenomic_complexity_filter - [- ] process > malt - [- ] process > maltextract - [- ] process > kraken - [- ] process > kraken_parse - [- ] process > kraken_merge - [b4/eee536] process > output_documentation [100%] 1 of 1 ✔ [25/fe50d0] process > get_software_versions [100%] 1 of 1 ✔ [- ] process > multiqc - [d9/8843fa] NOTE: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134) -- Execution is retried (1) [60/806da7] NOTE: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134) -- Execution is retried (2) [3e/94b195] NOTE: Process fastqc (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (1) [b4/3f3618] NOTE: Process fastqc (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (1) [57/ce1069] NOTE: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134) -- Execution is retried (3) [16/a39299] NOTE: Process fastqc_after_clipping (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (1) [32/672d12] NOTE: Process fastqc_after_clipping (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (1) [64/c1c686] NOTE: Process fastqc (JK2802_L2) terminated with an error exit status (134) -- Execution is retried (2) [cc/df2fa7] NOTE: Process fastqc (JK2782_L1) terminated with an error exit status (134) -- Execution is retried (2) Error executing process > 'makeSeqDict (Mammoth_MT_Krause.fasta)'

Caused by: Process makeSeqDict (Mammoth_MT_Krause.fasta) terminated with an error exit status (134)

Command executed:

picard -Xmx6144M CreateSequenceDictionary R=Mammoth_MT_Krause.fasta O="Mammoth_MT_Krause.dict"

Command exit status: 134

Command output: (empty)

Command error: /opt/conda/envs/nf-core-eager-2.4.7/bin/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2023-10-26 09:14:46 CreateSequenceDictionary

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** CreateSequenceDictionary -R Mammoth_MT_Krause.fasta -O Mammoth_MT_Krause.dict


library initialization failed - unable to allocate file descriptor table - out of memory/opt/conda/envs/nf-core-eager-2.4.7/bin/picard: line 66: 42 Aborted (core dumped) /opt/conda/envs/nf-core-eager-2.4.7/bin/java -Xmx6144M -jar /opt/conda/envs/nf-core-eager-2.4.7/share/picard-2.26.0-0/picard.jar CreateSequenceDictionary "R=Mammoth_MT_Krause.fasta" "O=Mammoth_MT_Krause.dict"

Work dir: /home/vmhisado/Desktop/Victor/eager/work/79/b37061dc4ff64d98dc92fa58896d9d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-[nf-core/eager] Pipeline completed with errors- WARN: Killing running tasks (3)

jfy133 commented 8 months ago

Thanks @0victormh0 , ok next step, can you try with --max_memory '12.GB' at the end of your command?

0victormh0 commented 8 months ago

Hi again @jfy133, I find the same problem with: nextflow run nf-core/eager -profile test,docker -c custom.config --max_memory '12.GB'

Im sorry for your time loss

jfy133 commented 8 months ago

OK can you try going into /home/vmhisado/Desktop/Victor/eager/work/79/b37061dc4ff64d98dc92fa58896d9d

and running bash .command.run

and then if that fails again, install picard v2.26 (e.g. with conda), and run bash .command.sh...

I'm really confused what is going on here... it's not been reported before

0victormh0 commented 8 months ago

Hi again @jfy133 , Maybe this is too much frustrating. After running bash .command.run, I found the same error. I installed picard v2.26 and ran bash.command.sh. Here I post the error for from this command. (Working directory changed because I tried again and again). mv: no se puede efectuar `stat' sobre 'JK2782_PE.mapped.bam': No existe el fichero o el directorio INFO 2023-10-31 09:27:45 MarkDuplicates

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** MarkDuplicates -INPUT JK2782.bam -OUTPUT JK2782_rmdup.bam -REMOVE_DUPLICATES TRUE -AS TRUE -METRICS_FILE JK2782_rmdup.metrics -VALIDATION_STRINGENCY SILENT


09:27:45.433 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/vmhisado/.conda/envs/nextflow/share/picard-2.26.11-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Oct 31 09:27:45 CET 2023] MarkDuplicates INPUT=[JK2782.bam] OUTPUT=JK2782_rmdup.bam METRICS_FILE=JK2782_rmdup.metrics REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Tue Oct 31 09:27:45 CET 2023] Executing as vmhisado@laptop on Linux 6.3.8-100.fc37.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1+13-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.11 INFO 2023-10-31 09:27:45 MarkDuplicates Start of doWork freeMemory: 29480760; totalMemory: 35651584; maxMemory: 8589934592 INFO 2023-10-31 09:27:45 MarkDuplicates Reading input file and constructing read end information. INFO 2023-10-31 09:27:45 MarkDuplicates Will retain up to 31122951 data points before spilling to disk. WARNING 2023-10-31 09:27:45 SamFiles The index file /home/vmhisado/Desktop/Victor/eager/work/52/30cf46c49ae2bb505eb469174d86af/JK2782_PE.mapped.bam.bai was found by resolving the canonical path of a symlink: /home/vmhisado/Desktop/Victor/eager/work/f0/540698e025e6ebd09e4bb2dfad6978/JK2782.bam -> /home/vmhisado/Desktop/Victor/eager/work/52/30cf46c49ae2bb505eb469174d86af/JK2782_PE.mapped.bam INFO 2023-10-31 09:27:45 MarkDuplicates Read 1250 records. 0 pairs never matched. INFO 2023-10-31 09:27:45 MarkDuplicates After buildSortedReadEndLists freeMemory: 185626216; totalMemory: 444596224; maxMemory: 8589934592 INFO 2023-10-31 09:27:45 MarkDuplicates Will retain up to 268435456 duplicate indices before spilling to disk. INFO 2023-10-31 09:27:46 MarkDuplicates Traversing read pair information and detecting duplicates. INFO 2023-10-31 09:27:46 MarkDuplicates Traversing fragment information and detecting duplicates. INFO 2023-10-31 09:27:46 MarkDuplicates Sorting list of duplicate records. INFO 2023-10-31 09:27:46 MarkDuplicates After generateDuplicateIndexes freeMemory: 1450362712; totalMemory: 3607101440; maxMemory: 8589934592 INFO 2023-10-31 09:27:46 MarkDuplicates Marking 90 records as duplicates. INFO 2023-10-31 09:27:46 MarkDuplicates Found 0 optical duplicate clusters. WARNING 2023-10-31 09:27:46 SamFiles The index file /home/vmhisado/Desktop/Victor/eager/work/52/30cf46c49ae2bb505eb469174d86af/JK2782_PE.mapped.bam.bai was found by resolving the canonical path of a symlink: /home/vmhisado/Desktop/Victor/eager/work/f0/540698e025e6ebd09e4bb2dfad6978/JK2782.bam -> /home/vmhisado/Desktop/Victor/eager/work/52/30cf46c49ae2bb505eb469174d86af/JK2782_PE.mapped.bam INFO 2023-10-31 09:27:46 MarkDuplicates Reads are assumed to be ordered by: coordinate INFO 2023-10-31 09:27:46 MarkDuplicates Writing complete. Closing input iterator. INFO 2023-10-31 09:27:46 MarkDuplicates Duplicate Index cleanup. INFO 2023-10-31 09:27:46 MarkDuplicates Getting Memory Stats. INFO 2023-10-31 09:27:46 MarkDuplicates Before output close freeMemory: 55849832; totalMemory: 62914560; maxMemory: 8589934592 INFO 2023-10-31 09:27:46 MarkDuplicates Closed outputs. Getting more Memory Stats. INFO 2023-10-31 09:27:46 MarkDuplicates After output close freeMemory: 22604536; totalMemory: 29360128; maxMemory: 8589934592 [Tue Oct 31 09:27:46 CET 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.02 minutes. Runtime.totalMemory()=29360128 .command.sh: línea 4: samtools: orden no encontrada

jfy133 commented 7 months ago

Sorry @0victormh0 I've been on parental leave so got behind.

I hope it isn't too late - but I'm wondering if it's something to do with docker engine not getting enough memory.

I think @TCLamnidis had this problem a long time ago ..

0victormh0 commented 5 months ago

Hi again @jfy133, you're good, I hope it went well. This months I've been trying to get docker installed in my institute's server and it finally happened. In the server (125GB RAM), the pipeline run succesfully so it seems that the problem is because my laptop doesn't have enough (16BG RAM).

A problem with the compatibility of docker took place in this server and the IT guys uninstalled it, when docker works again I will start with my samples. If I find errors then, maybe is better to open a new thread.

Thank you so much for your help! Víctor

jfy133 commented 5 months ago

Glad to hear it was indeed a memory error in the end!

If your IT people are having problem with docker, I would recommend something like apptainer (new name for singularity), this functions the same as docker but it's server/HPC/system admin friendly :)

0victormh0 commented 5 months ago

Great, I will tell them!

TCLamnidis commented 5 months ago

Glad to hear things seem resolved for now. I will close this issue now. Feel free to open a new one if the problems resurface :)