Closed demianw closed 7 years ago
Could you say more about why you think this is post-merge of #249 ? For example, do you get the same problem when you run from commit 6f58e02 ?
I know it's post-merge cause when I run it from the corresponding merge commit everything works. I will try and run from that commit later. Is there a way to establish tests for these cases?
How are you detecting the memory usage at the moment? Maybe we could use something similar for a test?
Because I get the exception OSError: [Errno 12] Cannot allocate memory
upon trying the nifti.cifti2.load(f)
.
I think that the key problem for the test is to have a large enough file on the test battery.
OK, it does work from https://github.com/nipy/nibabel/commit/6f58e028b6b6c43d00b5c2e94a351896a7d164f1 too. I tried tracing back the error but it seems to be deep into the memorymap and lazy loading data mechanisms
Just to confirm - you mean there is no memory error from 6f58e02 ?
I'm guessing you are on Python 2.7?
Would you mind checking whether you get the memory error on Python 3, for this commit?
@effigies - I wonder whether we should just revert the BytesIO
refactor, assuming that the cStringIO
implementation from six
is gentler with memory than io.BytesIO
?
There's no memory error from commit https://github.com/nipy/nibabel/commit/6f58e028b6b6c43d00b5c2e94a351896a7d164f1
I'm working on python 3.5 not on 2.7.
Sure. Let's revert and just fix the import from externals.
Feel free to do it, or I can when I get to a computer.
Should we generally switch back to six.BytesIO? The main impulse I was indulging was a desire for consistency.
@demianw - that's really odd, that you get this change in behavior on Python 3.5 - Chris' recent refactoring should only have an effect for Python 2.
What do you get when you checkout the master branch again, and you apply this patch:
diff --git a/nibabel/cifti2/parse_cifti2.py b/nibabel/cifti2/parse_cifti2.py
index 63fc0dd..774544f 100644
--- a/nibabel/cifti2/parse_cifti2.py
+++ b/nibabel/cifti2/parse_cifti2.py
@@ -11,7 +11,7 @@ from __future__ import division, print_function, absolute_import
from distutils.version import LooseVersion
import numpy as np
-from io import BytesIO
+from six import BytesIO
from .cifti2 import (Cifti2MetaData, Cifti2Header, Cifti2Label,
Cifti2LabelTable, Cifti2VertexIndices,
?
@matthew-brett Mmmmmm. OK, I cannot reproduce the error now. The computer where I'm running everything is under much less memory stress (external conditions). Yet I dealt with this issue thorough the day of yesterday and today's morning (on a script that had worked flawlessly for weeks).
Thanks for such quick replies and I'll work on developing a test where we can reproduce this on other computers before crying wolf again.
@demianw If you could test pre and post refactor on each 3.5 and 2.7, that would be great. Thanks for the report.
Well, the best thing will be to try and add a memory stress test to the test battery and let travis do the work ;)
@demianw - are you sure you weren't using Python 2 when you got the error?
@matthew-brett @effigies I'm absolutely sure I'm on python 3. Also the error happens again but only when the machine is under memory stress. Here you have the traceback which should help
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-117-599208c50a5e> in <module>()
----> 1 client.futures_of(full_cortex_admm_c2)[0].result()
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/distributed/client.py in result(self)
111 result = sync(self.client.loop, self._result, raiseit=False)
112 if self.status == 'error':
--> 113 six.reraise(*result)
114 elif self.status == 'cancelled':
115 raise result
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
683 value = tp()
684 if value.__traceback__ is not tb:
--> 685 raise value.with_traceback(tb)
686 raise value
687
<ipython-input-11-758a42c38a1f> in get_nibabel_slices()
2
3 def get_nibabel_slices(filename, slice_):
----> 4 nib_dataobj = nib.load(filename).dataobj
5 return nib_dataobj[slice_]
6
/home/dwasserm/software/nibabel/nibabel/loadsave.py in load()
43 is_valid, sniff = image_klass.path_maybe_image(filename, sniff)
44 if is_valid:
---> 45 img = image_klass.from_filename(filename, **kwargs)
46 return img
47
/home/dwasserm/software/nibabel/nibabel/filebasedimages.py in from_filename()
258 def from_filename(klass, filename):
259 file_map = klass.filespec_to_file_map(filename)
--> 260 return klass.from_file_map(file_map)
261
262 @classmethod
/home/dwasserm/software/nibabel/nibabel/cifti2/cifti2.py in from_file_map()
1330 """
1331 from .parse_cifti2 import _Cifti2AsNiftiImage, Cifti2Extension
-> 1332 nifti_img = _Cifti2AsNiftiImage.from_file_map(file_map)
1333
1334 # Get cifti2 header
/home/dwasserm/software/nibabel/nibabel/keywordonly.py in wrapper()
15 '{0} takes at most {1} positional argument{2}'.format(
16 func.__name__, n, 's' if n > 1 else ''))
---> 17 return func(*args, **kwargs)
18 return wrapper
19 return decorator
/home/dwasserm/software/nibabel/nibabel/analyze.py in from_file_map()
967 data = klass.ImageArrayProxy(imgf, hdr_copy, mmap=mmap)
968 # Initialize without affine to allow header to pass through unmodified
--> 969 img = klass(data, None, header, file_map=file_map)
970 # set affine from header though
971 img._affine = header.get_best_affine()
/home/dwasserm/software/nibabel/nibabel/cifti2/parse_cifti2.py in __init__()
148 raise ValueError('Nifti2 header does not contain a CIFTI2 '
149 'extension')
--> 150 self.cifti_img.data = self.get_data()
151
152
/home/dwasserm/software/nibabel/nibabel/dataobj_images.py in get_data()
187 if self._data_cache is not None:
188 return self._data_cache
--> 189 data = np.asanyarray(self._dataobj)
190 if caching == 'fill':
191 self._data_cache = data
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/numeric.py in asanyarray()
581
582 """
--> 583 return array(a, dtype, copy=False, order=order, subok=True)
584
585
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in __array__()
149 def __array__(self):
150 # Read array and scale
--> 151 raw_data = self.get_unscaled()
152 return apply_read_scaling(raw_data, self._slope, self._inter)
153
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in get_unscaled()
144 offset=self._offset,
145 order=self.order,
--> 146 mmap=self._mmap)
147 return raw_data
148
/home/dwasserm/software/nibabel/nibabel/volumeutils.py in array_from_file()
507 shape=shape,
508 order=order,
--> 509 offset=offset)
510 # The error raised by memmap, for different file types, has
511 # changed in different incarnations of the numpy routine
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/memmap.py in __new__()
262 bytes -= start
263 offset -= start
--> 264 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
265
266 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
OSError: [Errno 12] Cannot allocate memory
Can you also confirm you don't see this error with 6f58e02 ?
@matthew-brett That error does not look likely to be caused by any uses of BytesIO
changed in #519. I think this may just be an OOM, and I would not expect it to work on 19fd3f8 under the same conditions.
@effigies is right it actually happens intermittently in all commits. Is there a way I can modify
_Cifti2AsNiftiImage
https://github.com/nipy/nibabel/blob/ca977abeb77f95ed3a40b7b89c310b286b9885b7/nibabel/cifti2/parse_cifti2.py#L123 such that it loads like a Nifti2Image
and performs lazy loading of the data?
Maybe it's line 150 that needs to be changed? https://github.com/nipy/nibabel/blob/ca977abeb77f95ed3a40b7b89c310b286b9885b7/nibabel/cifti2/parse_cifti2.py#L150
Line 150 looks like it can be removed. _Cifti2AsNiftiImage.cifti_img.data
is never read. Unless it was specifically added to load everything into memory in one shot.
@demianw Do you want to try removing it and seeing if you're still able to load your files at all? And then under low-memory conditions?
Thanks! Done that, things work for small pieces. But I still get a problem which is not there when loading the files with nifti.load
/home/dwasserm/software/nibabel/nibabel/loadsave.py in load()
43 is_valid, sniff = image_klass.path_maybe_image(filename, sniff)
44 if is_valid:
---> 45 img = image_klass.from_filename(filename, **kwargs)
46 return img
47
/home/dwasserm/software/nibabel/nibabel/filebasedimages.py in from_filename()
258 def from_filename(klass, filename):
259 file_map = klass.filespec_to_file_map(filename)
--> 260 return klass.from_file_map(file_map)
261
262 @classmethod
/home/dwasserm/software/nibabel/nibabel/cifti2/cifti2.py in from_file_map()
1344 # User array proxy object where possible
1345 dataobj = nifti_img.dataobj
-> 1346 return Cifti2Image(reshape_dataobj(dataobj, dataobj.shape[4:]),
1347 header=cifti_header,
1348 nifti_header=nifti_img.header,
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in reshape_dataobj()
177 """
178 return (obj.reshape(shape) if hasattr(obj, 'reshape')
--> 179 else np.reshape(obj, shape))
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/fromnumeric.py in reshape()
230 [5, 6]])
231 """
--> 232 return _wrapfunc(a, 'reshape', newshape, order=order)
233
234
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/fromnumeric.py in _wrapfunc()
65 # a downstream library like 'pandas'.
66 except (AttributeError, TypeError):
---> 67 return _wrapit(obj, method, *args, **kwds)
68
69
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/fromnumeric.py in _wrapit()
45 except AttributeError:
46 wrap = None
---> 47 result = getattr(asarray(obj), method)(*args, **kwds)
48 if wrap:
49 if not isinstance(result, mu.ndarray):
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/numeric.py in asarray()
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in __array__()
149 def __array__(self):
150 # Read array and scale
--> 151 raw_data = self.get_unscaled()
152 return apply_read_scaling(raw_data, self._slope, self._inter)
153
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in get_unscaled()
144 offset=self._offset,
145 order=self.order,
--> 146 mmap=self._mmap)
147 return raw_data
148
/home/dwasserm/software/nibabel/nibabel/volumeutils.py in array_from_file()
507 shape=shape,
508 order=order,
--> 509 offset=offset)
510 # The error raised by memmap, for different file types, has
511 # changed in different incarnations of the numpy routine
/home/dwasserm/software/anaconda2/envs/root3/lib/python3.5/site-packages/numpy/core/memmap.py in __new__()
262 bytes -= start
263 offset -= start
--> 264 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
265
266 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
OSError: [Errno 12] Cannot allocate memory
I'm going to take a guess, and say the issue is that we can't partially parse the XML portion of the CIFTI file. By loading it as NIfTI-2, you can leave the data as a block on disk and access it when needed. In order to actually construct the CIFTI structure, you have to fully load it.
So I think what would be needed to lazily load:
ArrayProxy
s.In the absence of the former, one approach could be to maintain an offset table, and when contents are returned, search forward from the last offset for the current contents.
This doesn't sound that logical cause I can load the image as a NIfTI-2 and get the XML from the NIfTI-2 header without a problem and parse it. (CIFTI-2 is just a particular case of NIFTI-2 with the XML in the header and the 4 first dimensions set to 1)
Ah, I misunderstood. I thought the XML was in the data block, like Gifti. As it's not, it may still be the case we need to switch to lazy loading of sub-blocks, but I can't speak with any confidence here.
On Mar 24, 2017 11:37 AM, "Demian Wassermann" notifications@github.com wrote:
This doesn't sound that logical cause I can load the image as a NIfTI-2 and get the XML from the NIfTI-2 header without a problem and parse it. (CIFTI-2 is just a particular case of NIFTI-2 with the XML in the header and the 4 first dimensions set to 1)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nipy/nibabel/issues/520#issuecomment-289058044, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFF8hCLXBFLM-FZC8Z_kJowYPpY4--rks5ro-NLgaJpZM4Mn96p .
It's weird because the CIFTI-2 class is just a wrapper around the NIFTI-2 one discarding the first for dimensions of the data. Loading the CIFTI-2 images as NIFTI-2 (through nibabel.nifti2.load
) and discarding the first 4 dimensions by hand works perfectly.
I think the bug should be in that section but the lazy loading system is a bit too hard for me to get in the amount of time I have available.
So, it looks like you should be able to reproduce the error with:
from nibabel import Nifti2Image, reshape_dataobj
img = Nifti2Image.load(fname)
assert img.shape[:4] == (1, 1, 1, 1)
dataobj = reshape_dataobj(img.dataobj, img.shape[4:])
So it's not a Cifti2Image
issue but ArrayProxy
.
And my reading is that because there is no ArrayProxy.reshape
, numpy is forcing a full read through ArrayProxy.__array__
. So what we need is an ArrayProxy.reshape
that doesn't load the data.
(I hope I'm not as wrong as last time...)
Yup! I think that you've hit the right nail this time!
Can you try 4c4405b? (May need to fetch from my repo.)
If you don't throw out the original ArrayProxy
, there may be conditions where you're screwing up shared state, but we can worry about engineering issues after we learn if this resolves your issue.
Great! I'm having trouble locating the branch / repo for that commit in the github interface
git remote add effigies https://github.com/effigies/nibabel.git
git fetch effigies
git checkout 4c4405b
If you want the branch instead of the commit, it's ap_reshape
.
Doesn't work
/home/dwasserm/software/nibabel/nibabel/loadsave.py in load(filename, **kwargs)
43 is_valid, sniff = image_klass.path_maybe_image(filename, sniff)
44 if is_valid:
---> 45 img = image_klass.from_filename(filename, **kwargs)
46 return img
47
/home/dwasserm/software/nibabel/nibabel/filebasedimages.py in from_filename(klass, filename)
258 def from_filename(klass, filename):
259 file_map = klass.filespec_to_file_map(filename)
--> 260 return klass.from_file_map(file_map)
261
262 @classmethod
/home/dwasserm/software/nibabel/nibabel/cifti2/cifti2.py in from_file_map(klass, file_map)
1344 # User array proxy object where possible
1345 dataobj = nifti_img.dataobj
-> 1346 return Cifti2Image(reshape_dataobj(dataobj, dataobj.shape[4:]),
1347 header=cifti_header,
1348 nifti_header=nifti_img.header,
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in reshape_dataobj(obj, shape)
187 """ Use `obj` reshape method if possible, else numpy reshape function
188 """
--> 189 return (obj.reshape(shape) if hasattr(obj, 'reshape')
190 else np.reshape(obj, shape))
/home/dwasserm/software/nibabel/nibabel/arrayproxy.py in reshape(self, shape)
171 header=self._header,
172 mmap=self._mmap)
--> 173 new_ap.shape = shape
174 return new_ap
175
Right. Sorry about that. Re-fetch and try: 1eb6fbe
Need to jump to something else will try tomorrow
OK, this works now. We just need to add this patch
diff --git a/nibabel/cifti2/parse_cifti2.py b/nibabel/cifti2/parse_cifti2.py
index 63fc0dd..ee32f7a 100644
--- a/nibabel/cifti2/parse_cifti2.py
+++ b/nibabel/cifti2/parse_cifti2.py
@@ -147,7 +147,6 @@ class _Cifti2AsNiftiImage(Nifti2Image):
if self.cifti_img is None:
raise ValueError('Nifti2 header does not contain a CIFTI2 '
'extension')
- self.cifti_img.data = self.get_data()
class Cifti2Parser(xml.XmlParser):
Hi!
it seems that one of the post-"Merge pull request #249" updates to the codebase is forcing the
load
function of cifti2 to bring into memory the whole loaded matrix. For large files (I'm working on files of ~10Gb) this forces a "cannot allocate memory" exception.I don't know how is it possible to add such large files to the test-base such that we can have a control for this. In any case it will be great if the error could be fixed.