Discussion: deprecating (most of) skimage.io in favour of imageio

jni commented 4 years ago

Description

This is meant to supersede the discussion in #1605, which contains a lot of outdated information.

In 2020, here are the pros for replacing skimage.io with imageio:

distributing wheels is easy now
imageio has feature parity with skimage.io. In fact, I would be surprised if 99% of our io.imread calls didn't go straight through to imageio — at least if we exclude tifffile.
this post by @danielballan, for those that haven't seen it, gives a rough outline for implementing a single protocol for lazy image reading in Python. imageio is a natural place to implement such work, and skimage.io is at best redundant, at worst, another place for things to fall out of sync.

Things that I think need to change in imageio for us to completely drop skimage.io:

imageio.imread does return a fancy array wrapper rather than a straight numpy array, which breaks the scikit-image API. However, this is somewhat necessary for imageio to provide metadata. Not sure where to meet on this one.
imageio.imread only gets a single plane out of a tiff file, preferring to leave volume loading to imageio.volread. imho we should aim to be dimension-agnostic. @almarklein is this something you foresee could change? See e.g. #4138.

In my opinion, the existence of both skimage.io and imageio is redundant, and it makes sense to separate the two functionalities (IO vs processing/analysis) into two separate packages.

@emmanuelle argues for keeping imread. I'm ok with that but I also would contemplate removing it for 1.0.

sciunto commented 4 years ago

+1 with @emmanuelle I beleive it is important to keep wrappers of our io functions because it helps a lot beginners.

rfezzani commented 4 years ago

Thank you @jni for opening this, you can't imagine how enthusiastic I am about dropping skimage.io! :smiley:

I already shared my feelings about this, and I am convinced that skimage can gain maturity and stability if it focuses on it's core functionality: image processing!

There is plenty of excellent python packages dedicated to IO and viewing. We must advertise for them instead of proposing yet another implementation with all the induced maintenance...

@sciunto, the skimage deprecation procedure takes 2 releases before completely removing the deprecated part from the package. I think that this is enough time to teach beginners how to properly read/write images without our skimage.io module :wink:

almarklein commented 4 years ago

this post by @danielballan, for those that haven't seen it, gives a rough outline for implementing a single protocol [...]

I hadn't seen that one yet, but I find the idea of standards (over implementations) very interesting. Also the idea to use entrypoints for packages to advertise image/data reading capabilities without importing them is very interesting to scale supported formats without making a lib like imageio "heavy". This needs some more thought ...

imageio.imread does return a fancy array wrapper rather than a straight numpy array, which breaks the scikit-image API. However, this is somewhat necessary for imageio to provide metadata.

Perhaps it's finally time to finally fix this long-standing issue ;)

imageio.imread only gets a single plane out of a tiff file, preferring to leave volume loading to imageio.volread. imho we should aim to be dimension-agnostic. @almarklein is this something you foresee could change? See e.g. #4138.

There are some formats that are somewhat ambiguous about whether their data is a set of 2D images or one 3D image. Most notably Tiff and Dicom. In imageio I've made the choice to let the user decide what dimensionality is expected, and pass that hint to the reader. So there's imread() for single 2D images, mimread() for image series, volread() for volumes and mvolread() for series of volumes.

I believe this approach works well. I think that if you'd want to keep an imread() function in skimage, I'd recommend adding (at least) volread() as well.

jni commented 4 years ago

There are some formats that are somewhat ambiguous about whether their data is a set of 2D images or one 3D image. Most notably Tiff and Dicom.

Sure, but from a user perspective, we'd want a 3D array in either case, right? For users who work with anything from 2D to 3D+t+c images, having to remember which of the 4 functions to call today is annoying. (Not to mention that sometimes I get user-provided data or downloaded data from a paper and I actually don't know ahead of time the dimensionality.) I believe this is a problem for the metadata, but the data should be a NumPy (or Dask) array, in any case, and so in any case we should use the same function.

almarklein commented 4 years ago

Sure, but from a user perspective, we'd want a 3D array in either case, right?

I think it's not that simple. Both Tiff and Dicom can provide multiple images (or volumes) of different shapes, so they cannot be stacked into a single array.

having to remember which of the 4 functions to call today is annoying.

Well, you're going to do something with the data, and for that you're assuming a certain shape. I understand that for data exploration it would be nice to see what's actually inside ... which brings us full circle to meta data ;)

jni commented 4 years ago

Well, you're going to do something with the data, and for that you're assuming a certain shape.

Nope — napari will happily display any shape array. ;)

Both Tiff and Dicom can provide multiple images (or volumes) of different shapes, so they cannot be stacked into a single array.

That is definitely a concern — but one that volread doesn't solve, does it? I would suggest imread (nD image) and mimread (iterator of nD images).

almarklein commented 4 years ago

napari will happily display any shape array. ;)

Nice! Ok, I see the use-case for an API to just "gimme the data". But I also think there is a need to be able to be more precise.

Somewhere at the plugin level, there is a function _get_data(index). In a lot of cases the file contains one image, index is 0, and all is well. In some cases (e.g. video) it's clear that index represents the nth image. But for e.g. Tiff, the format needs a hint to know whether the user wants the whole thing or is iterating over the slices. Right now, this hint is provided by the user using imshow/mimshow/volshow.

Maybe, we could do something along the lines of imread(hint="greedy"). By default this will try to simply ready all data, but the user can set the hint (or mode?) to '2d' or '3d' to specify a more precise intention if needed.

edit: PS: this hint is also used in imageio to select the appropriate format. E.g. Pillow can read gif and Tiff, but not files with multiple frames.

emmanuelle commented 4 years ago

My argument in favor of keeping io.imread: even if we do the deprecation properly etc., it would be a communication/public relations disaster to break 90% of users' scripts. People are going to complain a lot, heavily, and for a long time. We know that there is a long queue of people who upgrade their version only when they really need to, who have a lot of legacy code, etc. If we can help people improving their coding skills this is great, but this is not the right direction IMO. That said, we can have this discussion within the larger context of "what is going to break for users with 1.0" (but I hope that the answer is not 90% of scripts will have to be changed some way or another).

Removing the most used function in scikit-image (from this Github crawling which was done a while ago) does not sound like a good idea.

However, I'm very much in favour of making imread a wrapper as thin as possible around imageio, to discard the rest of the API, and to make any further developments in imageio.

sciunto commented 4 years ago

@emmanuelle I have similar feelings as yours. I would like to add also that some newcomers are lost partly because of the tool segmentation across libraries, which is a natural effect in FLOSS. Thus, I see sometimes scripts mixing pil/scikit-image/opencv merely by ignorance and I'm afraid that some users will search "how to load image in python", and use a library returning a format inappropriate for scikit-image and being lost.

Few wrappers can help the users until they fully understand the machinery. We can also imagine a strategy saying that we regularly reconsider (through web log analysis / forum inspection) if maintaining the wrappers are still relevant.

rfezzani commented 4 years ago

I had multiple discussions with other core team members (cc @jni, @hmaarrfk :wink:) about breaking changes etc... But I am still not convinced... People needing their scripts to keep working simply doesn't have to update skimage! Python environment are made to manage this... Our deprecation strategy is gentle enough to give users time to update their script if needed... We can also rely on our doc to educate new comers to good practices... I use to think that things evolves naturally, particularly in computer science, and that we don't have to worry much about breaking things... Users usually learn/adapt pretty fast to new behaviors :slightly_smiling_face:.

emmanuelle commented 4 years ago

In the dev meeting, Almar made an interesting suggestion:

rewrite our documentation to teach people how to use imageio (gallery examples in particular)
but keep io.imread as a thin wrapper to avoid breaking people's code.

I would be very happy about this trade-off. Thoughts?

rfezzani commented 4 years ago

Keeping skimage.io.imread is not a big deal. Defining it as

skimage.io.imread = imageio.imread

doesn't hurt. My main concern is to keep warning users that it is not the right way to go and to not maintain code in skimage.io!

rfezzani commented 4 years ago

In other words, we must say to scikit-image users that IO is no more a target, and that we delegate this to imageio because it is dedicated to IO and it manage it way better then we may do.

jni commented 4 years ago

Yes, that would be the idea @rfezzani, and I think everyone in the call was more or less on board. There is a slight caveat which might be that skimage.io.imread(x) might mean np.asarray(iio.imread(x)) (throw out the metadata), depending on whether imageio keeps returning a subclass or settles on an API that allows a bare-NumPy-array return.

But yeah, the basic idea is: keep a tiny shim of io that is a razor-thin layer around imageio.

FirefoxMetzger commented 4 years ago

Looks like consensus has already been reached.

I agree with outsourcing IO to imageio, which is a really great library. At the same time, I would be careful to get rid of skimage.io, because it demotes skimage from a framework to work with images to a library to process images. This doesn't seem like a massive change at first glance, but when you think about it, it severely weakens the position of skimage. I'd rather buy a complete and working bike than go shopping for the parts and assemble it myself; especially, if I don't feel confident building bikes. Independent of how well I can ride. In the same way, I think, skimage should remain a full framework to work with images instead of becoming just one part of the entire thing: a library that merely processes images.

Personally, I love the idea of depending on imageio. I'm not entirely convinced by the idea of a razor-thin layer around it though. There are three things that bother me:

(1) imageio returns the image AND metadata with its loading functions. Metadata is often not required and a plain numpy array would be much better; I'd really like the idea of a metadata(filename, ...) function though.

(2) I think the current API is a bit too fine-grained for loading images; having to choose between imread, mimread, volread, ... is really nice for fine grained control, but it steepens the learning curve. imageio chooses to make what's going on very explicit, because the focus is on IO. skimage.io, on the other hand, chooses a much more managed approach, because the focus is on image processing. I'd like to keep this convenience. I like the simplicity of pythons philosophy of "one function for one job" here; python only has list, not single_link_list, double_link_list, etc., but offers third party packages if such functionality is needed.

(3) I think that the default IO function should return the biggest image possible (in terms of dimensionality) or fail with a clear error message if that doesn't work (say images are different sizes). imageio.imread strictly returning a 2D image confused the heck out of me in the beginning. Sure, it can be resolved by reading the docs and changing to imageio.volread. My first guess though was user-error and I spent quite a while chasing a non-existing bug in my pipeline.

jni commented 3 years ago

@FirefoxMetzger

it demotes skimage from a framework to work with images to a library to process images. [...] I'd rather buy a complete and working bike than go shopping for the parts and assemble it myself; especially, if I don't feel confident building bikes. Independent of how well I can ride. In the same way, I think, skimage should remain a full framework to work with images instead of becoming just one part of the entire thing: a library that merely processes images.

I disagree here. I think the bicycle analogy fails because assembling a bike is hard. But Python libraries are essentially self-assembling if we adhere to standards and make sure that the libraries we depend on work perfectly well together. It is no more difficult to type pip install imageio scikit-image than to type pip install scikit-image. And it's even closer if we do pip install scikit-image[all].

I think becoming a pure processing library is an advantage more than a disadvantage. We are not aiming to become a Python distribution focused on images, and anyway we are so far from that: you still need NumPy to work with skimage, and a working Python install.

Regarding your objections to the thin shim:

(1) see my last comment, we would use np.asarray() on the imageio output, as we do now.

(2) we have had the discussion above and you can see in this comment that @almarklein is on board to simplify the imageio API. (@almarklein is the lead maintainer of imageio in case you didn't know that yet.)

(3) As with (2).

In other words, imageio is not a static thing that we have to work around, it is another member of our community willing to work with us to get to a place where we are all happy. :blush: The power of open source at work! It's taken a while but we'll get there. :sweat_smile:

FirefoxMetzger commented 3 years ago

@jni

It is no more difficult to type pip install imageio scikit-image than to type pip install scikit-image

If one needs both libraries in all (most) cases anyway, wouldn't it make more sense to have skimage depend on imageio? Potentially with giving documentation on how to substitute imageio (if there exists such a thing)?

Regarding the other points, I have no objections. I'm actually very much in favor of your idea. Those three points are, in my opinion, the major things standing in the way of this change. Consequentially, I would feel much more at ease knowing that there is not only willingness to address these (which in itself is already very good) but an explicit plan how to address these. That would also make it much easier to contribute ;D

almarklein commented 3 years ago

I would feel much more at ease knowing that there is [...] an explicit plan

I agree. But we first need consensus :) It looks like there is agreement about the basics. Maybe it's time to talk about the API now.

I made a proposal in https://github.com/imageio/imageio/issues/563. The short version is this:

# Get object that can be used to read/write image and meta data.
# The mode arg can be used to specify read/write mode, as well as the expected dimensionality.
# Default is "read greedy"
r = imopen(file, mode='read/write greedy/image/volume') 

# The short way to read an image
im = imopen(file).read()

# Read image and meta data
with imopen(file) as r:
    meta = r.get_meta()
    im = r.read()

# Read a volume or fail
vol = imopen(file, mode='volume').read()

# Iterate over 2D images
for im, meta in imopen(file, mode='image').iter(): 
  pass

And then:

# in skimage.io
import imageio
def imread(file):
    return imageio.imopen(file).read()

Please comment on this in the corresponding issue, so we don't go off-topic in this issue.

edited to use full words for the mode arg instead of single chars

emmanuelle commented 3 years ago

One advantage of squeezing io into just a thin-wrapper imread would be to drop some dependencies, for example SimpleITK (just noticed that there are some problems with SimpleITK and py3.9 in #5052).

FirefoxMetzger commented 3 years ago

I spent some time over at imageio to discuss a new API design with @almarklein and the latest draft for the API is here https://github.com/imageio/imageio/issues/569 .

I built a demo of the new imageio API (https://github.com/imageio/imageio/pull/574). There is, of course, more work needed to tightly integrate it into their codebase, but it works for demonstrational purposes (+/- some bugs (: ).

If/once this is merged we could wrap it as

from imageio.new_api import (
    imread
    imiter
)

which would work pretty much as you'd expect:

image = imread("some/path.jpg") # 2D color image
image = imread("some/other.tiff") # 3D image
image = imread("some.tiff", plugin="tifffile", index=3) # what ever tiffile deems sensible to return, 
                                                      # probably the ndimage at index 3 in the file
images = [img for img in imiter("some.tiff")] # read all images from a file and store them in a list

What will happen to the other functions in skimage.imageio? I'm not too sure what they do and if they need a backport or if they can simply be depricated. @stefanv

stefanv commented 3 years ago

@FirefoxMetzger We currently only utilize imread and imsave. Is there somewhere I can get an overview of the newly proposed API, without reading through the entire PR?

FirefoxMetzger commented 3 years ago

@stefanv The gist is in the top post of https://github.com/imageio/imageio/issues/569 .

rfezzani commented 3 years ago

@grlee77, did you close this issue intentionally?

jni commented 3 years ago

I'm gonna guess no, maybe @grlee77 can close with a rationale if my guess is wrong. 😂

grlee77 commented 3 years ago

Yeah, I don't remember closing this. Thanks for reopening

FirefoxMetzger commented 3 years ago

Allow me to necromance this thread since we (finally) managed to merge the new API into imageio. 🚀

Now it's time for writing docs beyond the functional API and to see what is needed to bring skimage.io and imageio closer together.

@jni (jni: please let me know if the image actually does trigger a @mention - it should)

@rfezzani @emmanuelle Does any one of you remember if there was a concrete plan for this already? It's been a while for me, so I don't remember 😆.

rfezzani commented 3 years ago

Thank you @FirefoxMetzger and @almarklein for your hard work and congratulation for the new imageio API :tada:

The closer thing to a plan I think is @emmanuelle's https://github.com/scikit-image/scikit-image/issues/5036#issuecomment-718667035 :smile:, concretely:

deprecate skimage.io,
build skimage.io.imread as a wrapper around imageio.imread,
Modify the documentation.

Did I forgot something @scikit-image/core ?

FirefoxMetzger commented 3 years ago

@rfezzani Awesome. Any existing examples that I should be aware of that need modification?

I was also thinking that it may not be effective to duplicate documentation here and on imageio, so maybe an effective approach would be to immediately direct people to the imageio docs? There could be a cool application of intersphinx here :)

sciunto commented 3 years ago

Did I forgot something @scikit-image/core ?

+1 and I guess we would like to read collections as well.

sciunto commented 3 years ago

@rfezzani Awesome. Any existing examples that I should be aware of that need modification?

I guess that the only place we discussed IO is in the user guide: https://scikit-image.org/docs/stable/user_guide.html

rfezzani commented 3 years ago

The deprecation message may also be displayed in skimage.io API documentation.

FirefoxMetzger commented 3 years ago

I guess we would like to read collections as well.

@sciunto What do you mean by collection? a sequence of images, a folder, pages of a tiff file? video?

sciunto commented 3 years ago

https://scikit-image.org/docs/stable/api/skimage.io.html#skimage.io.imread_collection

FirefoxMetzger commented 3 years ago

@sciunto I looked through the docs and the source code, but I still don't think I got the full picture.

To me, this sounds like a collection is a variant of

imread_collection = [iio.imread(file) for file in glob.glob(pattern) if Path(file).isfile()]

Is that correct?

I was asking because in the new imageIO API the default imread call will try to create an ndarray (not limited to ndim in [2, 3]). Hence, something like this works

iio.imread('some.gif').shape
#  [layers, height, width, channel]

# probably memory hungry
iio.imread('some_video.mp4').shape
#  [frames, height, width, channel]

# likely better
iio.imread('some_video.mp4', index=3).shape
#  [height, width, channel]

# if you need all frames but they won't fit into memory
for img in iio.imiter('some_video.mp4'):
    img.shape
    #  [height, width, channel]

# formats with multiple different-size images
# (theoretically possible, practically unobserved)

iio.imread('uncommon.tiff').shape
# raises Exception

for img in iio.imiter('uncommon.tiff'):
    img.shape
    # probably what you want here

And I was wondering how much of the collection's use-cases are eaten up by this, and if the entire object can be depreciated or if it has unique features that need to live on.

stefanv commented 3 years ago

Along with @emmanuelle, I am -1 on deprecating skimage.io. The plan to have it a thin shim to imageio seemed like a good one. With the new imageio API, it is no longer a simple one-to-one match (if I read it correctly, they first have to instantiate a reader, then call a method on that reader to load the file; we don't currently have that and it seems unnecessary for our use-case).

@FirefoxMetzger The main difference between the list comprehension you show above and ImageCollection is that it lazily loads arrays as needed. I.e., you can easily do ImageCollection('*.jpg') on a folder with 2000 files.

sciunto commented 3 years ago

imread in imageio clearly covers more use cases than our imread (which is just awesome!)

In imread_collection, we have the possibility to use wildcards as well like imread_collection('folder/startname_00*.png')

rfezzani commented 3 years ago

Please correct me @FirefoxMetzger if I am wrong:

Along with @emmanuelle, I am -1 on deprecating skimage.io. The plan to have it a thin shim to imageio seemed like a good one. With the new imageio API, it is no longer a simple one-to-one match (if I read it correctly, they first have to instantiate a reader, then call a method on that reader to load the file; we don't currently have that and it seems unnecessary for our use-case).

imageio still have iio.imread according to previous example, the reader is only instantiated internally I think.

@FirefoxMetzger The main difference between the list comprehension you show above and ImageCollection is that it lazily loads arrays as needed. I.e., you can easily do ImageCollection('*.jpg') on a folder with 2000 files.

Isn't iio.imiter doing the tric here?

FirefoxMetzger commented 3 years ago

it is no longer a simple one-to-one match (if I read it correctly, they first have to instantiate a reader, then call a method on that reader to load the file; we don't currently have that and it seems unnecessary for our use-case)

@stefanv I'm not sure I follow. Could you explain which conflict you see? io.imread and iio.imread should do the same thing for everything that io.imread does, but extend it by doing meaningful things where io.imread will throw an exception.

There is ofc iio.imopen which gets called under the hood, but I think that one is quite useful for resource management since it provides a context manager that clearly communicates when a file is open or not. It might be not so useful in a notebook script (we have a functional API for that), but in a library that level of control should be useful, no?

Essentially

with iio.imopen("image.tiff") as file:
    metadata = file.get_meta()
    # do more processing
    if logic:
        img = file.read(index=3)

    img = file.read(index=0)

# file is closed here

Isn't iio.imiter doing the tric here?

@rfezzani Yes, you will get lazy loading of one image at a time. There may be a difference, because iio.imiter is a pure iterator, so you don't get random access. For that, you'd use the iio.imopen route I hinted at above, or if you only need a single index iio.imread(..., index=X) should do the trick (it closes the file afterward).

We don't do any glob-ing though. I keep reading that it would be very cool to read from a folder or a glob, and I think it would be cool indeed; however, there are too many edge-cases to create a clean API, and that is what keeps preventing me from proposing one. E.g., how should we handle it if files have different formats; should this trigger a plugin search for each file or raise an exception? For folders, should it be recursive, what about symlinks, what about recursive symlinks? How should we handle kwargs that allow configuring each backend that may or may not be involved in reading a file? What are sensible defaults for all of the above?

In many cases taming the resulting API as a user will quickly become just as complex as writing a script for it yourself, so we don't really do globbing or folders. If you do have an idea though, I'd be all ears :)

That said, there is apparently a legacy thing with folders and DICOM, where it is possible to read folders if one uses the DICOM backend. This might disappear in the future though because it's unique to the backend, and I know if it can be generalized into our API. It breaks for most of the plugins and keeps adding to the confusion why iio.imread("./training_set_images/) on a folder of JPGs doesn't work.

stefanv commented 3 years ago

@stefanv I'm not sure I follow. Could you explain which conflict you see? io.imread and iio.imread should do the same thing for everything that io.imread does, but extend it by doing meaningful things where io.imread will throw an exception.

OK, it wasn't clear to me that iio.imread is a recommended route any longer. It looked like the docs suggested you use imopen().imread (or something like that, I can't remember the exact function names now).

Also, there was discussion around it returning a proxy object instead of an array.

I feel it is important that, in scikit-image, we should keep skimage.io.imread, imread_collection, and imread for backward compatibility. We can get rid of our plugin infrastructure and solely rely on imageio to simplify things, especially if we can pass through our plugin argument to imageio.

We can also advocate for users to use imageio with its new API directly (it'd be fine to add docs to that extent to the user guide as well).

But, deprecating skimage.io.imread or modifying its API will require the modification of almost every single skimage script out there, so I don't think that's a feasible option for now.

We don't do any glob-ing though. I keep reading that it would be very cool to read from a folder or a glob, and I think it would be cool indeed; however, there are too many edge-cases to create a clean API, and that is what keeps preventing me from proposing one. E.g., how should we handle it if files have different formats; should this trigger a plugin search for each file or raise an exception? For folders, should it be recursive, what about symlinks, what about recursive symlinks? How should we handle kwargs that allow configuring each backend that may or may not be involved in reading a file? What are sensible defaults for all of the above?

These are tricky imageio design questions. But as long as we have access to an imread (which is all that imread_collection relies on), that will load files based on their extensions, then we're good.

mkcor commented 3 years ago

https://scikit-image.org/docs/stable/api/skimage.io.html#skimage.io.imread_collection

@sciunto personally I've adapted my code along these lines:

import imageio as io

collection = []
reader = io.get_reader('my_stack_of_images.tif')
for img in reader:
    collection.append(img)

Do you see cases where skimage.io.imread_collection would be irreplaceable?

FirefoxMetzger commented 3 years ago

OK, it wasn't clear to me that iio.imread is a recommended route any longer. It looked like the docs suggested you use imopen().imread (or something like that, I can't remember the exact function names now).

@stefanv I take that 😆 Better Documentation of the API is my next PR for imageio. My thinking was to make a draft PR here once we are on the same page what should happen to skimage.io and merge it after imageio v2.10 is released (which distributes the new API alongside the old #backward-compatibility 🥇).

Also, there was discussion around it returning a proxy object instead of an array.

If you mean that imageio used to return an object that wasn't an array (added image metadata) that thingy didn't survive. You will get a good old-fashioned np.ndarray. (metadata has a different function now)

I feel it is important that, in scikit-image, we should keep skimage.io.imread, imread_collection, and imread for backward compatibility. We can get rid of our plugin infrastructure and solely rely on imageio to simplify things, especially if we can pass through our plugin argument to imageio.

+1 for backwards compatibility :) I need to figure out what plugin does here, but from what it sounds like it should be similar to our plugin, where you can select the backend, e.g. iio.imopen("some.tiff", plugin="pillow", **kwargs) to enforce using pillow and not tifffile.

stefanv commented 3 years ago

That all sounds good; thanks @FirefoxMetzger, also for educating me!

sciunto commented 3 years ago

Do you see cases where skimage.io.imread_collection would be irreplaceable?

Yes, the random access that @FirefoxMetzger said to be not feasible. Ex: I want to check that a data processing works on a large stack of images (with a slider for example), by randomly accessing images. Of course, I can always write a code replacing that, but it is never a pleasant user experience.

FirefoxMetzger commented 3 years ago

@sciunto We also discussed a sliceable API as a contender to the one that I ended up implementing. I liked the idea, but it is again a rabbit hole of edge-cases. In any case, there is always the option to keep skimage.io.ImageCollection, give it the sliceable interface, and consume imageio through it. That's where the above mentioned iio.imopen would come in handy.

I want to check that a data processing works on a large stack of images (with a slider for example), by randomly accessing images.

Since you say stack, I'll point out that ImageCollection will only be advantageous if said stack consists of images ~in different sizes~ and/or formats. That way, you can't stack them into a numpy array, have to read each image individually, and can't fit all into memory.

If you are thinking about a stack of microscopy images or a video, then a multi-image format like .tiff or .mp4 will be much better, both in terms of storage and reading speed. The format can compress your data (likely there are exploitable patterns), and you only need to talk to the OS once to open a file; OS calls are sloooow. In this case, iio.imread can give you the full image tensor as discussed above. If you need lazy-loaded random access then you have a more niche use-case and for such specialist use you can go one level deeper and use iio.imopen directly.

with iio.imopen("dangerously_large.mov", "r") as im_file:
    # some processing
    metadata = im_file.get_meta(index=42)
    if metadata["field"] == some_value:
        img = im_file.read(index=42)
    else:
        img = im_file.read(index=21)

    img.get_meta(index=0)
    img.read(index=0)

# file will be closed and cleaned up here
final_computations()

The advantage of this setup is that you can dodge the bullet of having to re-open the file between multiple random access requests. We just keep the file around and seek() it until you are satisfied. That is - of course - under the assumption that the format is efficiently seekable and that the backend supports this. Otherwise, you will still get the benefits of not talking to the OS twice, but we will have to seek(0) and re-read the file up until the index you requested.

Edit: On second thought you can still use the above snippet if your images have different sizes since formats like .tiff can support storing differently sized images in the same file. I remember discussing this possibility during the design of our API, but nobody had seen it in the real world. It is still supported though :)

mkcor commented 2 years ago

I will definitely let the https://github.com/datacarpentry/image-processing folks know about this.

cf. https://datacarpentry.org/image-processing/03-skimage-images/index.html

/cc @tobyhodges

lagru commented 2 years ago

With SKIP 4 this seems like a good fit for the skimage2 milestone.

stefanv commented 1 year ago

skimage.io is currently a thin wrapper around imageio that ensures that a NumPy array is returned. We don't plan to change this for the forthcoming release, so I'm bumping the milestone to 0.21.

scikit-image / scikit-image

Discussion: deprecating (most of) skimage.io in favour of imageio #5036

Description