spacetelescope / drizzlepac

AstroDrizzle for HST images.
https://drizzlepac.readthedocs.io
BSD 3-Clause "New" or "Revised" License
52 stars 38 forks source link

IOError: [Errno 24] Too many open files #39

Closed ivastar closed 2 years ago

ivastar commented 7 years ago

I am trying to create a mosaic from ~300 FLTs and I keep getting this error. I have done similar mosaics before using older versions of drizzlepac and have not encountered this error. Setting the "in_memory" to True or False does not change the outcome. BTW, the crash happens at the 234th file.

        wht_type = 'IVM'
        output = 'goodsn-F105W-astrodrizzle-v4.3'
        final_refimage = '/astro/clear/cgosmeyer/ref_files/REF/goodsn_3dhst.v4.0.F125W_orig_sci.fits'
        astrodrizzle.AstroDrizzle(root+'_asn.fits', output=output, runfile = 'astrodrizzle.log', updatewcs = False, wcskey = 'TWEAK',
            proc_unit = 'native', coeffs = True, context = False, group = '', build = False, crbit = 4096, stepsize = 10,
            resetbits = 0, num_cores = None, in_memory = False, restore = False, preserve = False, overwrite = False,
            clean = True, static = False, static_sig = 4.0, skysub = True, skywidth = 0., skystat = '', skylower = None,
            skyupper = None, skyclip = 0, skylsigma = 0.0, skyusigma = 0.0, skyuser = 'MDRIZSKY', skyfile = '',
            driz_separate = False, driz_sep_wcs = False, median = False, blot = False, driz_cr = False,
            driz_combine = True, final_wht_type = wht_type, final_kernel = 'square', final_wt_scl = 'exptime',
            final_pixfrac = 0.8, final_fillval = None, final_bits = 576, final_units = 'cps', final_wcs = True,
            driz_sep_bits = 0, final_refimage=final_refimage)
## -- End pasted text --
INPUT_DICT: {'restore': False, 'final_wht_type': 'IVM', 'final_wcs': True, 'final_wt_scl': 'exptime', 'wcskey': 'TWEAK', 'median': False, 'driz_sep_wcs': False, 'skystat': '', 'final_refimage': '/astro/clear/cgosmeyer/ref_files/REF/goodsn_3dhst.v4.0.F125W_orig_sci.fits', 'static': False, 'skywidth': 0.0, 'skyupper': None, 'overwrite': False, 'final_fillval': None, 'crbit': 4096, 'blot': False, 'proc_unit': 'native', 'skyclip': 0, 'skyusigma': 0.0, 'skylower': None, 'final_pixfrac': 0.8, 'build': False, 'input': 'GOODSN-F105W_asn.fits', 'final_units': 'cps', 'preserve': False, 'driz_separate': False, 'clean': True, 'final_kernel': 'square', 'skysub': True, 'stepsize': 10, 'skylsigma': 0.0, 'runfile': 'astrodrizzle.log', 'final_bits': 576, 'in_memory': False, 'group': '', 'skyfile': '', 'resetbits': 0, 'driz_sep_bits': 0, 'driz_cr': False, 'skyuser': 'MDRIZSKY', 'num_cores': None, 'driz_combine': True, 'context': False, 'coeffs': True, 'output': 'goodsn-F105W-astrodrizzle-v4.3', 'static_sig': 4.0}
Setting up logfile :  astrodrizzle.log

AstroDrizzle Version 2.1.8(08-Feb-2017) started at: 14:42:06.991 (03/03/2017)

==== Processing Step  Initialization  started at  14:42:06.992 (03/03/2017)

##############################################################################
#                                                                            #
# ERROR:                                                                     #
# AstroDrizzle Version 2.1.8 encountered a problem!  Processing terminated   #
# at 14:45:18.884 (03/03/2017).                                              #
#                                                                            #
##############################################################################

   --------------------          --------------------
                   Step          Elapsed time
   --------------------          --------------------

         Initialization          0.0000 sec.

   ====================          ====================

                  Total          0.0000 sec.

Trailer file written to:  astrodrizzle.log

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/util.pyc in wrapper(*args, **kwargs)
    217             # finally clause is reached.
    218             try:
--> 219                 func(*args, **kwargs)
    220             except Exception as errorobj:
    221                 raise

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/astrodrizzle.pyc in run(configobj, wcsmap)
    182         procSteps.addStep('Initialization')
    183         imgObjList = None
--> 184         imgObjList, outwcs = processInput.setCommonInput(configobj)
    185         procSteps.endStep('Initialization')
    186

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/processInput.pyc in setCommonInput(configObj, createOutwcs)
    191                                             group=configObj['group'],
    192                                             undistort=undistort,
--> 193                                             inmemory=virtual)
    194
    195     # Add original file names as "hidden" attributes of imageObject

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/processInput.pyc in createImageObjectList(files, instrpars, group, undistort, inmemory)
    323     mt_refimg = None
    324     for img in files:
--> 325         image = _getInputImage(img,group=group)
    326         image.setInstrumentParameters(instrpars)
    327         image.compute_wcslin(undistort=undistort)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/processInput.pyc in _getInputImage(input, group)
    428             from . import wfc3Data
    429             if _detector == 'UVIS': return wfc3Data.WFC3UVISInputImage(input,group=group)
--> 430             if _detector == 'IR': return wfc3Data.WFC3IRInputImage(input,group=group)
    431
    432     except ImportError:

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/wfc3Data.pyc in __init__(self, filename, group)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/wfc3Data.pyc in __init__(self, filename, group)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/imageObject.pyc in __init__(self, filename, group, inmemory)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/stsci.tools-3.4.1.dev0-py2.7.egg/stsci/tools/fileutil.pyc in openImage(filename, mode, memmap, writefits, clobber, fitsname)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/stsci.tools-3.4.1.dev0-py2.7.egg/stsci/tools/fileutil.pyc in isFits(input)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/stsci.tools-3.4.1.dev0-py2.7.egg/stsci/tools/stpyfits.pyc in wrapped_with_stpyfits(*args, **kwargs)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.pyc in fitsopen(name, mode, memmap, save_backup, cache, **kwargs)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.pyc in fromfile(cls, fileobj, mode, memmap, save_backup, cache, **kwargs)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.pyc in _readfrom(cls, fileobj, data, mode, memmap, save_backup, cache, **kwargs)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/astropy/io/fits/file.pyc in __init__(self, fileobj, mode, memmap, clobber, cache)
    148             self._open_fileobj(fileobj, mode, clobber)
    149         elif isinstance(fileobj, string_types):
--> 150             self._open_filename(fileobj, mode, clobber)
    151         else:
    152             self._open_filelike(fileobj, mode, clobber)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/astropy/io/fits/file.pyc in _open_filename(self, filename, mode, clobber)
    477
    478         if os.path.exists(self.name):
--> 479             with fileobj_open(self.name, 'rb') as f:
    480                 magic = f.read(4)
    481         else:

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/astropy/io/fits/util.pyc in fileobj_open(filename, mode)

IOError: [Errno 24] Too many open files: 'ibohbiddq_flt.fits'
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-9-6c9268a119d1> in <module>()
     10     driz_combine = True, final_wht_type = wht_type, final_kernel = 'square', final_wt_scl = 'exptime',
     11     final_pixfrac = 0.8, final_fillval = None, final_bits = 576, final_units = 'cps', final_wcs = True,
---> 12     driz_sep_bits = 0, final_refimage=final_refimage)

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/astrodrizzle.pyc in AstroDrizzle(input, mdriztab, editpars, configobj, wcsmap, **input_dict)
    121     # already called 'run()'.
    122     if not editpars:
--> 123         run(configObj, wcsmap=wcsmap)
    124
    125 #

/Users/imomcheva/anaconda/envs/iraf27/lib/python2.7/site-packages/drizzlepac/util.pyc in wrapper(*args, **kwargs)
    227                     # (hope that end_logging didn't change the last exception raised)
    228                     if errorobj:
--> 229                         raise errorobj
    230
    231         return wrapper

IOError: [Errno 24] Too many open files: 'ibohbiddq_flt.fits'

In [10]:
ivastar commented 7 years ago

Seems like issue was related to memory mapping. I was sent the following hack that worked. Still it's unclear why all these files need to be open at once.

""" There was a second call where Warren describes how memory mapping caused the issue again, this time in the initialization:

"The problem stems from trying to use memory-mapping to access all those images at one time. The use of memory-mapping became the default for PyFITS with Version 3.1 and the astrodrizzle code has not been updated to adjust to this change. However, there is something you can do that should allow you to process the stack of images. The PyFITS FAQ (found at http://pythonhosted.org/pyfits/) specifies that:

In PyFITS 3.1, the mmap support is improved enough that memmap=True is the default for all pyfits.open() calls. The default can also be controlled through an environment variable called PYFITS_USE_MEMMAP. Setting this to 0 will disable mmap by default. "

The user suggested another workaround:

An alternative solution that worked for me is to also to run the following

import resource
resource.setrlimit(resource.RLIMIT_NOFILE, (10000,-1))

Where the first number is an arbitrarily large number of maximum files. """

mcara commented 7 years ago

Memory-mapping does open more file handles than it actually should: one file handle for every fits file extension. Also, astropy.io.fits has not been re-designed to open only one file handle per file.

Memory mapping in itself is not the problem here but rather the limitation on the number of open file handles imposed by the operating system. There is no fundamental reason for this limit to exist except, based on what I know, for enhanced security (viruses tend to open many files => limiting the number of file a process can open will enhance security).

Even if we update drizzlepac to not use memory mapping, there will be an user who will want to process 1200 images and who is going to run into the same problem.

Please see my last message in the following drizzlepac's forum thread: https://forum.stsci.edu/discussion/114/drizzlepac-errno-24-too-many-open-files

@stsci-hack In my opinion, it is not worth doing anything on this topic unless processing >100 files is a typical case.

bernie-simon commented 7 years ago

On Mar 8, 2017, at 10:47 AM, Mihai Cara notifications@github.com wrote:

Memory mapping in itself is not the problem here but rather the limitation on the number of open file handles imposed by the operating system. There is no fundamental reason for this limit to exist except, based on what I know, for enhanced security (viruses tend to open many files => limiting the number of file a process can open will enhance security).

The reason why this limit exists, at least on Unix, is that each process is allocated a fixed size table that contains all the open filehandles. That info needs to be stored because child processes inherit open filehandles from the parent process after a fork. It’s part of how Unix works.

ivastar commented 7 years ago

I am not questioning how this is done internally. Although the way I am running the code (with no sky matching, no CR detection, with an output reference image - just open the files and copy them to the output image), I can imagne that this could be done without opening all the files simultaneously.

I am concerned that the user has to manage the memory THEMSELVES. The code knows how many files it needs to open - why can't we change the resource limit internally to drizzlepac if we run into this limit? Considering that I was sent 3 different responses to help desk questions about this and there is a forum thread with 138 views, I would argue that this is a fairly typical case.

bernie-simon commented 7 years ago

Stack Overflow answers why python has limit for count of file handles. The limit can be increased but only up to what the OS allows.

jdavies-st commented 7 years ago

I wonder if the resource module in python could be used to modify the number of file handles needed within the loop if it is more than those available? The work around shown above resets the soft limit.

At least in unix these can be set via ulimit. The soft limit on my machine (OS X) is 256.

solo> ulimit -Sa
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

There is no hard limit

solo> ulimit -Ha
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) unlimited
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 65532
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1064
virtual memory          (kbytes, -v) unlimited

Python can find out what the current limit is

In [26]: import resource

In [35]: resource.getrlimit(resource.RLIMIT_NOFILE)
Out[35]: (256, 9223372036854775807)

which returns a tuple of the (soft, hard) limits. A new limit based on the number of open fits extensions it needs to have open could be set within the drizzle loop in drizzlepac using the resource.setrlimit() workaround above?

jhunkeler commented 7 years ago

Maybe something like this?

import resource

NOFILE_NOMINAL = 1024

def issue39_averted():
    soft = None
    hard = None

    try:
       soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
    except ValueError:
        print("Cannot determine open file limit.")
        return False

    if hard < NOFILE_NOMINAL:
        print("Cannot set open file limit to a nominal value. Contact your administrator.")
        return False

    if soft < NOFILE_NOMINAL:
        try:
            resource.setrlimit(resource.RLIMIT_NOFILE, (NOFILE_NOMINAL, hard))
        except ValueError:
            print("Cannot set open file limit. You may not have permission to do this. Contact your administrator.")
            return False

    return True
#
# somewhere else that matters...
#
ENABLE_MMAP = True

if not issue39_averted():
    ENABLE_MMAP = False
jhunkeler commented 7 years ago

-=Notes + Rant=-

$ sysctl -a |grep maxfiles
kern.maxfiles: 12288            # Kernel maximum
kern.maxfilesperproc: 10240     # Per process maximum (i.e. "unlimited")

A single user process can eat up most of the system's file handles, while only reserving 2048 handles for the kernel. As we already know the default file handle limit is abysmally small:

$ ulimit -n
256

$ launchctl limit maxfiles
    maxfiles    256            unlimited

The following test code opens /dev/null and counts the number of successful attempts:

#!/usr/bin/env python
import resource
import sys

_default_max = 10240
i = 0
handles = []

soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
if len(sys.argv) < 2:
    if soft < _default_max:
        soft = _default_max
else:
    soft = int(sys.argv[1])

resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))

while i <= hard:
    try:
        f = open('/dev/null', 'w')
        handles.append(f)
        i += 1
    except OSError:
        print(len(handles))
        for f in handles:
            f.close()
        exit(0)

So let's see what happens when we run it on a machine that's been up for under 24 hours:

$ uptime
 07:25am  up   8:28,  2 users,  load average: 1.91, 1.95, 1.63
$ ./crashfile.py
9274

The kernel and initial user session allocate a whopping 3014 of the 12228 available handles. That leaves us with 9274 remaining handles, and while it might seem like a lot, it goes quick. Now let's go about our daily business and see what happens after we open up some general applications (i.e. web browsers, email clients, text editors, pictures, videos). Just emulate a regular day at the office.

$ ./crashfile.py
8519

Not too shabby, I suppose. I mean it isn't as if OS X was designed to be a multi-user server or anything. Now let's look at a machine that's been up for a while:

$ uptime
 7:31  up 46 days, 16:34, 12 users, load averages: 1.17 1.25 1.20
$ ./crashfile.py 
620

Did you see that coming? I did. Effectively killing every program under my control should fix it...

$ pkill -U $USER
$ ./crashfile.py 
8132

Yup that worked, but most people will not accept that as a viable solution.

So what can a general user do to increase the file handle limit beyond 10240? Not much. Not unless they have root access or a sympathetic administrator on-call. The first crashfile.py example was executed on my personal iThing, so let's try to increase the hard limit to something crazy:

Toggle everything...

$ sudo sysctl -w kern.maxfiles=67584
$ sudo sysctl -w kern.maxfilesperproc=65536    # (67584 - 2048)
$ ulimit -n 65536

Here goes nothing...

$ ./crashfile.py 65536
65498

Bingo.

mcara commented 7 years ago

This issue should be partially addressed by the following PRs: https://github.com/spacetelescope/drizzlepac/pull/67 and https://github.com/spacetelescope/stsci.skypac/pull/13 until @jhunkeler implements his proposed solution.

stsci-hack commented 2 years ago

Resolved based on the PR's filed by Mihai. Subsequent experience with processing large sets of input exposures, for HAP SVM and MVM processing in particular, have not run into this problem since.

stscijgbot-hstdp commented 1 year ago

This issue is tracked on JIRA as HLA-698.