Closed ivastar closed 2 years ago
Seems like issue was related to memory mapping. I was sent the following hack that worked. Still it's unclear why all these files need to be open at once.
""" There was a second call where Warren describes how memory mapping caused the issue again, this time in the initialization:
"The problem stems from trying to use memory-mapping to access all those images at one time. The use of memory-mapping became the default for PyFITS with Version 3.1 and the astrodrizzle code has not been updated to adjust to this change. However, there is something you can do that should allow you to process the stack of images. The PyFITS FAQ (found at http://pythonhosted.org/pyfits/) specifies that:
In PyFITS 3.1, the mmap support is improved enough that memmap=True is the default for all pyfits.open() calls. The default can also be controlled through an environment variable called PYFITS_USE_MEMMAP. Setting this to 0 will disable mmap by default. "
The user suggested another workaround:
An alternative solution that worked for me is to also to run the following
import resource
resource.setrlimit(resource.RLIMIT_NOFILE, (10000,-1))
Where the first number is an arbitrarily large number of maximum files. """
Memory-mapping does open more file handles than it actually should: one file handle for every fits file extension. Also, astropy.io.fits
has not been re-designed to open only one file handle per file.
Memory mapping in itself is not the problem here but rather the limitation on the number of open file handles imposed by the operating system. There is no fundamental reason for this limit to exist except, based on what I know, for enhanced security (viruses tend to open many files => limiting the number of file a process can open will enhance security).
Even if we update drizzlepac
to not use memory mapping, there will be an user who will want to process 1200 images and who is going to run into the same problem.
Please see my last message in the following drizzlepac
's forum thread: https://forum.stsci.edu/discussion/114/drizzlepac-errno-24-too-many-open-files
@stsci-hack In my opinion, it is not worth doing anything on this topic unless processing >100 files is a typical case.
On Mar 8, 2017, at 10:47 AM, Mihai Cara notifications@github.com wrote:
Memory mapping in itself is not the problem here but rather the limitation on the number of open file handles imposed by the operating system. There is no fundamental reason for this limit to exist except, based on what I know, for enhanced security (viruses tend to open many files => limiting the number of file a process can open will enhance security).
The reason why this limit exists, at least on Unix, is that each process is allocated a fixed size table that contains all the open filehandles. That info needs to be stored because child processes inherit open filehandles from the parent process after a fork. It’s part of how Unix works.
I am not questioning how this is done internally. Although the way I am running the code (with no sky matching, no CR detection, with an output reference image - just open the files and copy them to the output image), I can imagne that this could be done without opening all the files simultaneously.
I am concerned that the user has to manage the memory THEMSELVES. The code knows how many files it needs to open - why can't we change the resource limit internally to drizzlepac if we run into this limit? Considering that I was sent 3 different responses to help desk questions about this and there is a forum thread with 138 views, I would argue that this is a fairly typical case.
Stack Overflow answers why python has limit for count of file handles. The limit can be increased but only up to what the OS allows.
I wonder if the resource
module in python could be used to modify the number of file handles needed within the loop if it is more than those available? The work around shown above resets the soft limit.
At least in unix these can be set via ulimit
. The soft limit on my machine (OS X) is 256.
solo> ulimit -Sa
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 709
virtual memory (kbytes, -v) unlimited
There is no hard limit
solo> ulimit -Ha
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) unlimited
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 65532
cpu time (seconds, -t) unlimited
max user processes (-u) 1064
virtual memory (kbytes, -v) unlimited
Python can find out what the current limit is
In [26]: import resource
In [35]: resource.getrlimit(resource.RLIMIT_NOFILE)
Out[35]: (256, 9223372036854775807)
which returns a tuple of the (soft, hard)
limits. A new limit based on the number of open fits extensions it needs to have open could be set within the drizzle loop in drizzlepac using the resource.setrlimit()
workaround above?
Maybe something like this?
import resource
NOFILE_NOMINAL = 1024
def issue39_averted():
soft = None
hard = None
try:
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
except ValueError:
print("Cannot determine open file limit.")
return False
if hard < NOFILE_NOMINAL:
print("Cannot set open file limit to a nominal value. Contact your administrator.")
return False
if soft < NOFILE_NOMINAL:
try:
resource.setrlimit(resource.RLIMIT_NOFILE, (NOFILE_NOMINAL, hard))
except ValueError:
print("Cannot set open file limit. You may not have permission to do this. Contact your administrator.")
return False
return True
#
# somewhere else that matters...
#
ENABLE_MMAP = True
if not issue39_averted():
ENABLE_MMAP = False
-=Notes + Rant=-
$ sysctl -a |grep maxfiles
kern.maxfiles: 12288 # Kernel maximum
kern.maxfilesperproc: 10240 # Per process maximum (i.e. "unlimited")
A single user process can eat up most of the system's file handles, while only reserving 2048 handles for the kernel. As we already know the default file handle limit is abysmally small:
$ ulimit -n
256
$ launchctl limit maxfiles
maxfiles 256 unlimited
The following test code opens /dev/null
and counts the number of successful attempts:
#!/usr/bin/env python
import resource
import sys
_default_max = 10240
i = 0
handles = []
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
if len(sys.argv) < 2:
if soft < _default_max:
soft = _default_max
else:
soft = int(sys.argv[1])
resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))
while i <= hard:
try:
f = open('/dev/null', 'w')
handles.append(f)
i += 1
except OSError:
print(len(handles))
for f in handles:
f.close()
exit(0)
So let's see what happens when we run it on a machine that's been up for under 24 hours:
$ uptime
07:25am up 8:28, 2 users, load average: 1.91, 1.95, 1.63
$ ./crashfile.py
9274
The kernel and initial user session allocate a whopping 3014 of the 12228 available handles. That leaves us with 9274 remaining handles, and while it might seem like a lot, it goes quick. Now let's go about our daily business and see what happens after we open up some general applications (i.e. web browsers, email clients, text editors, pictures, videos). Just emulate a regular day at the office.
$ ./crashfile.py
8519
Not too shabby, I suppose. I mean it isn't as if OS X was designed to be a multi-user server or anything. Now let's look at a machine that's been up for a while:
$ uptime
7:31 up 46 days, 16:34, 12 users, load averages: 1.17 1.25 1.20
$ ./crashfile.py
620
Did you see that coming? I did. Effectively killing every program under my control should fix it...
$ pkill -U $USER
$ ./crashfile.py
8132
Yup that worked, but most people will not accept that as a viable solution.
So what can a general user do to increase the file handle limit beyond 10240? Not much. Not unless they have root access or a sympathetic administrator on-call. The first crashfile.py
example was executed on my personal iThing, so let's try to increase the hard limit to something crazy:
Toggle everything...
$ sudo sysctl -w kern.maxfiles=67584
$ sudo sysctl -w kern.maxfilesperproc=65536 # (67584 - 2048)
$ ulimit -n 65536
Here goes nothing...
$ ./crashfile.py 65536
65498
Bingo.
This issue should be partially addressed by the following PRs: https://github.com/spacetelescope/drizzlepac/pull/67 and https://github.com/spacetelescope/stsci.skypac/pull/13 until @jhunkeler implements his proposed solution.
Resolved based on the PR's filed by Mihai. Subsequent experience with processing large sets of input exposures, for HAP SVM and MVM processing in particular, have not run into this problem since.
This issue is tracked on JIRA as HLA-698.
I am trying to create a mosaic from ~300 FLTs and I keep getting this error. I have done similar mosaics before using older versions of drizzlepac and have not encountered this error. Setting the "in_memory" to True or False does not change the outcome. BTW, the crash happens at the 234th file.