radio-astro-tools / spectral-cube

Library for reading and analyzing astrophysical spectral data cubes
http://spectral-cube.rtfd.org
BSD 3-Clause "New" or "Revised" License
96 stars 62 forks source link

Cannot allocate memory when writing a fits file #519

Closed D-Callanan closed 5 years ago

D-Callanan commented 5 years ago

When I attempt to write a fits file after convolving the cube to a consistent beam size, I run into Errno 12, with the traceback:

Traceback (most recent call last):
  File "beam_fix.py", line 34, in <module>
    convolved_lsb_cube.write(sourcename+'_CASA/'+sourcename+'.asic.lsb.multiscale.deep..convolved.pbcor.fits', format='fits', overwrite=True)
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/spectral_cube.py", line 2177, in write
    write(filename, self, overwrite=overwrite, format=format)
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/io/core.py", line 66, in write
    write_fits_cube(filename, cube, overwrite=overwrite)
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/io/fits.py", line 210, in write_fits_cube
    hdulist = cube.hdulist
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/spectral_cube.py", line 2378, in hdulist
    return HDUList(self.hdu)
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/spectral_cube.py", line 2373, in hdu
    hdu = PrimaryHDU(self.filled_data[:].value, header=self.header)
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/cube_utils.py", line 222, in __getitem__
    return self._func(self._other, view)
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/base_class.py", line 335, in filled_data
    return u.Quantity(self._get_filled_data(view, fill=self._fill_value),
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/base_class.py", line 321, in _get_filled_data
    use_memmap=use_memmap
  File "/home/dcallana/.local/lib/python2.7/site-packages/spectral_cube-0.4.4.dev1982-py2.7.egg/spectral_cube/masks.py", line 220, in _filled
    dtype=dt)
  File "/home/dcallana/.local/lib/python2.7/site-packages/numpy/core/memmap.py", line 264, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
mmap.error: [Errno 12] Cannot allocate memory

I'm fairly certain I am not running into an issue with storage quota, and I've run hpy to determine the amount of memory I'm using before the troublesome line of code, with gives me:

Partition of a set of 501421 objects. Total size = 85150120 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 169981  34 19606024  23  19606024  23 str
     1  47707  10 17784040  21  37390064  44 unicode
     2 121698  24 10851080  13  48241144  57 tuple
     3   1434   0  4742256   6  52983400  62 dict of module
     4   4721   1  4348952   5  57332352  67 dict (no owner)
     5  28842   6  3691776   4  61024128  72 types.CodeType
     6  28141   6  3376920   4  64401048  76 function
     7   2936   1  2654104   3  67055152  79 type
     8  17975   4  2576952   3  69632104  82 list
     9   2936   1  2459072   3  72091176  85 dict of type
<1026 more rows. Type e.g. '_.more' to view.>

The code that runs into this issue is:

if os.path.isfile(sourcename+'_CASA/'+sourcename+'.asic.lsb.multiscale.deep.pbcor.fits'):
    lsb_cube =VaryingResolutionSpectralCube.read(sourcename+'_CASA/'+sourcename+'.asic.lsb.multiscale.deep.pbcor.fits')
    lsb_freq = lsb_cube.spectral_axis
    lsb_beam = lsb_cube.beams.sr
    lsb_cube.allow_huge_operations=True
    use_memmap = True   
    masked_lsb_cube = lsb_cube.mask_out_bad_beams(0.3, criteria=['sr','major','minor'])
    del lsb_cube
    lsb_common_beam = masked_lsb_cube.beams[masked_lsb_cube._goodbeams_mask].common_beam(tolerance=1e-5)
    masked_lsb_cube.allow_huge_operations = True    
    convolved_lsb_cube = masked_lsb_cube.convolve_to(lsb_common_beam)
    del masked_lsb_cube
    lsb_convolve = convolved_lsb_cube.beam.sr.value
    convolved_lsb_cube.allow_huge_operations=True
    print h.heap()
convolved_lsb_cube.write(sourcename+'_CASA/'+sourcename+'.asic.lsb.multiscale.deep.convolved.pbcor.fits', format='fits', overwrite=True)

The file is 13GB in size.

keflavich commented 5 years ago

@e-koch @astrofrog do either of you have any idea how this could be happening? There's a memory allocation error in mmap. I thought the whole point of memmap was to get around that?

keflavich commented 5 years ago

@D-Callanan could you do df -h just to verify that there is adequate hard drive space? I wonder also if this could be a problem with where tempfile is putting the temporary file; you can specify the target directory (and make sure it's on a HD with space) using memmap_dir: https://github.com/radio-astro-tools/spectral-cube/blob/master/spectral_cube/spectral_cube.py#L2639

keflavich commented 5 years ago

@D-Callanan it would also be useful to figure out what the limitations are on your machine. Something like this:

import numpy as np
import tempfile
for size in np.logspace(3,13,11):
    ntf = tempfile.NamedTemporaryFile()
    mm = np.memmap(ntf.name, mode='w+', shape=(int(size),), dtype=float)
    print("Size 10^{0} succeeded".format(np.log10(size)))

and see when it fails. When I run this test, I get:

Size 10^3.0 succeeded
Size 10^4.0 succeeded
Size 10^5.0 succeeded
Size 10^6.0 succeeded
Size 10^7.0 succeeded
Size 10^8.0 succeeded
Size 10^9.0 succeeded
Size 10^10.0 succeeded
Size 10^11.0 succeeded
Size 10^12.0 succeeded
Traceback (most recent call last):
  File "<ipython-input-14-17800c68f2c7>", line 5, in <module>
    mm = np.memmap(ntf.name, mode='w+', shape=(int(size),), dtype=float)
  File "/users/aginsbur/anaconda/envs/python3.6/lib/python3.6/site-packages/numpy/core/memmap.py", line 250, in __new__
    fid.seek(bytes - 1, 0)
OSError: [Errno 22] Invalid argument
e-koch commented 5 years ago

@D-Callanan - You could also run ulimit and see what it is set to. A similar problem is mentioned here: https://github.com/xgcm/xgcm/issues/40.

keflavich commented 5 years ago

These issues might be related: https://github.com/astropy/astropy/issues/1380 https://github.com/astropy/astropy/pull/7926

keflavich commented 5 years ago

Also, @D-Callanan, could you check whether you're on a 32-bit or 64-bit system?

$ python -c "import sys; print(sys.maxsize)"
9223372036854775807
D-Callanan commented 5 years ago

Thanks for the quick responses!

Running df -h says there are 8.0TB available, so I don't think that's the issue. I've also manually set the memmap_dir to that directory with no luck.

The output of the memory test @keflavich suggested is

Size 10^3.0 succeeded
Size 10^4.0 succeeded
Size 10^5.0 succeeded
Size 10^6.0 succeeded
Size 10^7.0 succeeded
Size 10^8.0 succeeded
Size 10^9.0 succeeded
Traceback (most recent call last):
  File "mem_test.py", line 6, in <module>
    mm = np.memmap(ntf.name, mode='w+', shape=(int(size),), dtype=float)
  File "/home/dcallana/.local/lib/python2.7/site-packages/numpy/core/memmap.py", line 264, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
mmap.error: [Errno 12] Cannot allocate memory

ulimit is apparently not a command on the machine I'm running my code on, so I'd assume a limit hasn't been set?

And finally, the result of python -c "import sys; print(sys.maxsize)" is 9223372036854775807.

keflavich commented 5 years ago

@D-Callanan are you certain it's 8.0 TB on the drive you're trying to allocate the space on? It might be, I just want to be certain.

D-Callanan commented 5 years ago

I've double checked and of the 52T of the filesystem I'm using, 8T is free.

keflavich commented 5 years ago

Could you print the output of cat /proc/meminfo? That's another test from here.

keflavich commented 5 years ago

AFAICT, we're using the correct memmap mode - np.memmap mode 'w+', which maps to mmap.ACCESS_WRITE, which means that no memory except the actual hard drive space should be allocated. In other words, spectral-cube is doing the right thing.

Why can't @D-Callanan allocate more than 1 GB? I think this must be some sort of weird 32 bit OS limitation. @D-Callanan, I think it's time we contact the IT dept and ask them for help. There must be something 32-bit on their system that's blocking us, even though python is 64-bit.

D-Callanan commented 5 years ago

This is the output of cat /proc/meminfo:

MemTotal:       132038992 kB
MemFree:        21350768 kB
Buffers:           49992 kB
Cached:         34165592 kB
SwapCached:       234900 kB
Active:         90256516 kB
Inactive:       17838248 kB
Active(anon):   68363604 kB
Inactive(anon):  5547564 kB
Active(file):   21892912 kB
Inactive(file): 12290684 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:              0 kB
Dirty:                80 kB
Writeback:             0 kB
AnonPages:      73644288 kB
Mapped:            63312 kB
Shmem:             31980 kB
Slab:            1303124 kB
SReclaimable:    1157060 kB
SUnreclaim:       146064 kB
KernelStack:       17984 kB
PageTables:       225040 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    70213796 kB
Committed_AS:   85599096 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      815116 kB
VmallocChunk:   34290372180 kB
HardwareCorrupted:     0 kB
AnonHugePages:  46608384 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       54272 kB
DirectMap2M:     1972224 kB
DirectMap1G:    132120576 kB
keflavich commented 5 years ago

ok... you have plenty of vm.... wtf.

keflavich commented 5 years ago

This issue has been partly resolved offline: there was a restriction imposed by the sysadmins limiting user allocated memory to <100 GB, including swap. It's unclear why that was blocking editing of ~8-20 GB files, but removing that restriction has apparently ameliorated the problem.

keflavich commented 5 years ago

@D-Callanan can we close this now that the sysadmins found the issue?

D-Callanan commented 5 years ago

@keflavich Yes, I believe so. Thank you for the help with this.