Closed effigies closed 4 years ago
I've noticed that nilearn and nistats are working around the deprecation of img.get_data() with a public get_data() function.
Indeed, nibabel's API is a moving target, and it's frequent modifications break users' script and documentation available across internet.
It would be nice to hear what the specific use cases are that make this preferable to moving to get_fdata()
Number one problem is moving target. We are unfortunately progressively abstracting out nibabel's API because changes of behavior are difficult to follow. In particular, they end up corrupting example scripts with switches to select functions depending on the nibabel version. This goes against our work to keep those scripts as simple as possible.
It would be good to really try very hard to not deprecate things. The Internet has a "memory" of multiple years: outdated answers on stack-overflow like sites, old tutorials, slides, code for articles. Each deprecation sends ripples that take several years to get over and our code is full of shims due to supporting multiple versions of nibabel.
Beyond the problem of moving target, one thing that I move worry (but maybe this has been discussed) is the output dtype. We are more and more moving towards default to float32. Indeed, with the increasing size of data, memory usage is more and more a problem. In addition, we have found that there is seldom enough SNR in neuroimaging data to require float64.
Gael - I know if get_data
, get_fdata
in terms of API change. Were you thinking of something else?
Indeed, we do try very hard not to deprecate. You probably remember the discussions from 2015 to mid 2017 where we hashed this through. At the time the idea was to:
Beyond the problem of moving target, one thing that I move worry (but maybe this has been discussed) is the output dtype. We are more and more moving towards default to float32. Indeed, with the increasing size of data, memory usage is more and more a problem. In addition, we have found that there is seldom enough SNR in neuroimaging data to require float64.
I think that even more compelling is the case of atlases or masks, where we want the data to retain its integer type.
Another change is getting a copy rather than a reference to the image's cache, although we should always create a new image rather than modify this data array in-place.
Yes, there will surely be times when you want to control the output dtype for memory purposes, or because you really want integers. We can go back through it, if it's helpful, but the discussion up til 2017 was about whether output data type should depend on the presence of scaling in the header, which is often arbitrary; most image libraries don't intend for scaling to be interpreted that way. We decided it should not, hence the change.
I am not sure what you're referring to with the cache / copy change. get_fdata
continues to return a cached version, like get_data
.
I think there are at least three possibilities:
1) Continue as-is, and nilearn will have a
get_data()
helper function 2) Un-deprecateimg.get_data()
3) An alternative method that covers the non-get_fdata()
parts ofget_data()
What makes sense to people?
I think that solution 2) would have been best, but now that we have
added get_data
we may hold on to it, because for example it works on
filenames or lists of filenames or glob patterns.
And just process-wise, it would be nice to know how we can get feedback before pushing people to work around changes to the API. I'm mostly aware of this because I'm watching the nistats repository, but the planned
get_data()
replacement and deprecation has been on the mailing list and in the documentation for a couple years.
I cannot speak for others, but for my part I simply missed this
discussion. I was less involved in nilearn at the time, and I must admit
I do not often read the nibabel documentation as I almost exclusively
interact with nibabel through nilearn. thus I only became aware of the
deprecation when this PR was open in nilearn:
https://github.com/nilearn/nilearn/pull/2155
Then I did not realize that un-deprecating get_data
was an option,
though in retrospect reviving this discussion on the nibabel repo should
have been the first thing to do.
np.asarray(img.dataobj)
would have worked (we just would have needed
to complete a bit the interface of some nilearn mocks that mimick an
Nifti1Image), but either way I think we would have wrapped this in a
function.
In any case, the problem was not having to work around the interface,
but having to change many lines of code in the nilearn modules and
examples (and users will have to do the same). No matter how long the
deprecation stays in the documentation, if it is a breaking change at
some point users will pay the cost of editing (in this case hudreds of
lines of) code. Again, I am sorry for not participating in the nibabel
discussion when it happened, nor trying to revive it when the
deprecation cycle started before fixing it in nilearn. Solution "(2)
un-deprecate get_data
" would have worked perfectly for nilearn.
I am not sure what you're referring to with the cache / copy change.
get_fdata
continues to return a cached version, likeget_data
.
sorry, scratch that part then
Unfortunately, as I'm sure you know, the low-level details often mean that the option that seems easiest for the higher-level libraries is a bad option for the long term - and I think get_data
is one such instance. For the reasons, I can point you to the discussion, if you're interested.
I personally think that undeprecating get_data
would be a terrible idea, especially at this stage. It's results were extremely confusing to beginners, and some non-beginners, and it is only possible to understand why it generates what it does, when you know some nasty low-level details of the Nifti format, that vary more or less randomly across producers of Nifti images.
I think this is the only instance we have of a 'moving-target' API, at least, of any significance, but I'm happy to be corrected.
I am not sure what you're referring to with the cache / copy change.
get_fdata
continues to return a cached version, likeget_data
.sorry, scratch that part then
although that point remains for the np.asarray(img.dataobj, dtype=...)
Having said that, I could imagine adding get_data
back with a more scary name, like get_data_hdr_scaling
or something.
Unfortunately, as I'm sure you know, the low-level details often mean that the option that seems easiest for the higher-level libraries is a bad option for the long term - and I think
get_data
is one such instance. For the reasons, I can point you to the discussion, if you're interested.
I saw this explanation: https://github.com/nipy/nibabel/wiki/BIAP8 , if this not what you mean I am interested. I understand the motivation for preferring get_fdata, and I guess we should add a dtype argument to nilearn.image.get_data for the same reasons. Still, we will retain the possibility to avoid casting to float for atlases, masks and ROIs.
I personally think that undeprecating
get_data
would be a terrible idea, especially at this stage.
Nilearn and Nistats do not rely on get_data anymore; the deprecation has been handled for these two libraries.
Beyond the problem of moving target, one thing that I move worry (but maybe this has been discussed) is the output dtype. We are more and more moving towards default to float32. Indeed, with the increasing size of data, memory usage is more and more a problem. In addition, we have found that there is seldom enough SNR in neuroimaging data to require float64.
note that we can use img.get_fdata(dtype='float32')
The BIAP is the main source, but there's a lot of discussion on the linked mailing list thread, and a little more at https://mail.python.org/pipermail/neuroimaging/2017-April/001383.html
I think this is the only instance we have of a 'moving-target' API, at least, of any significance, but I'm happy to be corrected.
I think that you are correct in the sense that the moving target that has been bothering me over the years is how to retrieve the array data of a NiftiImage. However, for our uses this is the main API of nibabel that we use.
My memory is very blurry, as this problem has not been my main focus over the year. However, if I remember right, we started by img.data being the recommended way of accessing data, then moved to img.get_data and now img.get_fdata. I also remember _data, and dataobj, but I do not remember how they relate to the other APIs. The point being that over the years, we have had difficulties giving a consistent and simple story to the users on how to retrieve the data.
Yes, that would be rather moving-targetey - but I don't think that's right - it was always .get_data()
- see relevant line from first release in 2010. There was a private attribute _data
, that was obviously not recommended, and later we added the dataobj
attribute as a lower-level public API, for them as wanted efficient access of various sorts - see https://nipy.org/nibabel/images_and_memory.html - but we always recommended .get_data
, up until the change to get_fdata
.
Hi Gael, Jerome, thanks for responding. I really appreciate it.
I appreciate that such a major deprecation is painful, and we do (and will) work hard to avoid this as much as possible. I'm not convinced that there's a compelling case to undeprecate get_data()
, and will proceed for now with the deprecation as planned. Given that you've already separated out get_data()
, reverting that seems gratuitous, and keeping it in nilearn will allow you to improve on the old nibabel API when it comes to issues like nilearn/nilearn#2209.
To try to communicate better going forward, I wonder if you have any suggestions as to how we can make the documentation clearer? If nilearn developers are having issues finding what you need, I'm sure there are many users who are having greater difficulties.
And if there are parameters we can tweak in our deprecation process that would make life easier when we do have deprecations, please let me know.
As one of the dominant layers between users and directly using the library, I'd like to invite you to open issues here or on the neuroimaging@python.org mailing list if you have suggestions for general API.
Yes, that would be rather moving-targetey - but I don't think that's right - it was always .get_data()
You must be right. I am sorry that I was confused. We were probably using the wrong API in the beginning.
I don't have any real opinions on the deprecation/removal, but I am interested in what you all think should be "best practice" for loading an image from disk, modifying it, and saving it back to disk.
I've always relied on the internal cache managed by DataobjImage
objects, and returned by get_data
. I've happily written code in the knowledge that both myself and nibabel
have access to a single numpy
array - this made modifying/saving easy - I could just modify the array directly, and call img.to_filename
.
But now that this internal cache is being deprecated/removed, I can no longer do things in this way.
My workflow is probably now going to be (as already implied by @jeromedockes):
nibabel
np.asanyarray(img.dataobj)
to get the data, and maintain my own reference to itnibabel
image using the old header and the modified data, and save it to disknibabel
imageSo I'm just curious - is this what you would recommend?
(as a side note, my use case is to always want the data unscaled and in its native type, so the fact that get_fdata
and its associated cache is not being removed is irrelevant).
- Load an image file with
nibabel
- Use
np.asanyarray(img.dataobj)
to get the data, and maintain my own reference to it- Modify this data as needed
- Create a new
nibabel
image using the old header and the modified data, and save it to disk- Discard the reference to the original
nibabel
imageSo I'm just curious - is this what you would recommend?
I think that this is what I would do
If you always want unscaled data, then you should use img.dataobj.get_unscaled()
, not np.array(img.dataobj)
or img.get_data()
, which will apply scale factors if present in the image.
You might want to wrap this with something like:
def get_unscaled(dataobj):
if nb.arrayproxy.isproxy(dataobj):
return dataobj.get_unscaled()
return dataobj
But otherwise, your sequence is what I do. I'm actually somewhat surprised that modifying the cached result of get_data()
and then calling to_filename
on the same image object worked for you, since writing Analyze-like images (including NIfTI) reads the dataobj
, not a cache:
Cool, thanks for the clafiication. It looks like my previous approach (modifying the cached array from get_data
) was fine with nibabel < 3
, as to_file_map
was using get_data
:
https://github.com/nipy/nibabel/commit/c8d93e6067ec793c294d46fb1d4aee805b188a59
But it looks like I've been wrong about the scaling, although I don't often come across images with slope/inter parameters that aren't equal to 1/0...
I've noticed that nilearn and nistats are working around the deprecation of
img.get_data()
with a publicget_data()
function.It would be nice to hear what the specific use cases are that make this preferable to moving to
get_fdata()
ornp.asarray(img.dataobj)
. Is it just that the latter is a pain to type? The loss of caching?I think there are at least three possibilities:
1) Continue as-is, and nilearn will have a
get_data()
helper function 2) Un-deprecateimg.get_data()
3) An alternative method that covers the non-get_fdata()
parts ofget_data()
What makes sense to people?
And just process-wise, it would be nice to know how we can get feedback before pushing people to work around changes to the API. I'm mostly aware of this because I'm watching the nistats repository, but the planned
get_data()
replacement and deprecation has been on the mailing list and in the documentation for a couple years.cc @jeromedockes @GaelVaroquaux @bthirion @kchawla-pi @nipy/team-nibabel