thunder-project / thunder

scalable analysis of images and time series
http://thunder-project.org
Apache License 2.0
814 stars 184 forks source link

Add API for concatenating images #332

Open mheppner opened 8 years ago

mheppner commented 8 years ago

With the current thunder.images.fromtif() call, either a single image or a glob of images in a directory can be loaded. To load specific images, you have to do something like this:

rdds = [
    thunder.images.fromtif('path1'),
    thunder.images.fromtif('path2'),
]
bigRdd = sc.union(rdds)
data = thunder.images.fromrdd(bigRdd)

In addition to the other methods mentioned in #331, an API could be added to concatenate image objects, like this:

data1 = thunder.images.fromtif('path1')
data2 = thunder.images.fromttif('path2')
data = data1.concatenate(data2)
d-v-b commented 7 years ago

@mheppner I think this use case is handled by thunder.images.fromlist(), which takes a list of files and a function for loading each file.

So in your example, you would do something like this:

# list of paths to images
im_paths = ['path_to_image_1', 'path_to_image_2']

# a function that takes a path and returns image data
def tif_loader(path):
    from skimage.io import imread
    return imread(path)

data = thunder.images.fromlist(im_paths, accessor=tif_loader)

Does this work for you? (Personally I like this a lot more than the thunder.images.fromtif() approach...)

mheppner commented 7 years ago

That could work too, but it takes away from some of the magic of using .fromtif(). I could supply my own accessor, but I would really just be copying the one already in .fromtif(), which feels a bit odd to me. I guess this would ultimately come down to changing .frompath(). Regardless, I can either give it a single file, an entire directory of files, or a glob pattern, but there's no way to load just a specific set of files.

The use case I have is to search a database to get paths of tifs to load into a thunder set. The only way of doing this is either to copy all the files into a temporary directory, or the method I mentioned above of joining all the RDDs. #331 is going to be more useful than this issue, but I figured I'd add it anyways. I think it still might be useful to concatenate images together though.

d-v-b commented 7 years ago

What magic is there in using .fromtif()? Maybe I'm coming from a different perspective because I work with a variety of image formats, but the .fromlist() constructor seems about as simple and direct as you can get -- feed it a list of images, populated however you like (e.g., by searching a database) and then specify how to load the files with your own accessor. You could copy the accessor in .fromtif(), but you could more simply use any other function for loading .tif files. Doesn't this satisfy your use case?

mheppner commented 7 years ago

Yes, that does fit the use case, but why copy something that already exists? Why can't .fromtif() simply accept a list as well as a string?

boazmohar commented 7 years ago

@d-v-b @mheppner I think the difference is that .fromtif() takes as a parameter nplanes (and now also discard_extra), so it knows what to do with multi-page tifs.

d-v-b commented 7 years ago

@mheppner I agree that .fromtif() should be able to take a list of files. I'm sure if you put together a PR that implemented this someone would have a look at it.