jpuritz / dDocent

a bash pipeline for RAD sequencing
ddocent.com
MIT License
52 stars 42 forks source link

Genomic interval creation failed #85

Open tamsinlh opened 2 years ago

tamsinlh commented 2 years ago

Received this error:

Creating alignment intervals mawk: cannot open "mapped.2918.bed" for output (Too many open files) Genomic interval creation failed. This may be related to the maximum number of open files.

I am unsure how to fix. Please advise.

pdimens commented 2 years ago

This is often an occurrence with the default open file limits found on Unix-like systems, not a bug within dDocent itself. The short answer is to remove the limit during your terminal session. You can do so with the command:

ulimit -s unlimited

For more information, there is a useful thread about it here

This should work on Fedora/RedHat/CentOS/Debian-based Linux flavors. If using Arch Linux or some other Linux types, the process may be different and you should refer to the nuance of upping the file limit on those systems specifically.

Anto007 commented 1 year ago

I encountered the above error for the first time with a new large dataset of mine (n=364 samples): "mawk: cannot open "mapped.3638.bed" for output (Too many open files)" Despite this error, the final files seem to be created while running dDocent v2.8.13. As suggested here, I even ran ulimit -s unlimited before starting the dDocent run but the error message simply doesn't go away. Please find attached the relevant output files from running dDocent v2.8.13. dDocent_main.LOG ls-l.txt dDocent.runs.txt

I was kinda assuming that the error would be safe to ignore since the expected output files were created but then just out of curiosity, I ran the same dataset on dDocent v2.9.4 (conda installation) on the same compute workstation after inputting ulimit -s unlimited and I got the below error message.

Creating alignment intervals
mawk: cannot open "mapped.3638.bed" for output (Too many open files)
Genomic interval creation failed.  This may be related to the maximum number of open files.

Obviously, the expected output files were not created in this run and I'm now confused as to whether it would be safe to go ahead and use the result files instead from my dDocent v2.8.13 run (that seemed to have output the final files despite the mawk error)? I hope it would be OK for me to use the results from my dDocent v2.8.13 run? In case you require this information: my compute workstation is a high-end scientific workstation that has Ubuntu 20.04.2 LTS , 512 MB memory, 104 CPUs and several terabytes of free space. Thanks a lot again!

Anto007 commented 1 year ago

In case this will be useful to others: I solved this by setting ulimit -n 1000000 before beginning my dDocent run