idiap / attention-sampling

This Python package enables the training and inference of deep learning models for very large data, such as megapixel images, using attention-sampling
Other
98 stars 18 forks source link

Allow use of a patch generator #12

Open DanielRobertNicoud opened 4 years ago

DanielRobertNicoud commented 4 years ago

When working with very big images, sometimes using a function generating the patches instead of passing the whole high resolution image can be very memory convenient. An example is reading from .tiff files: the package openslide has a very convenient read_region function that can be used to return a patch from an image.

It could be a nice feature to have the option of passing a function generating patches instead of x_high as an input. I am not sure if the current structure of the code would allow for it easily, I am still wrapping my head around it.

angeloskath commented 4 years ago

Hi Daniel,

The code is supporting it under the current API but it is not implemented. The API is in https://github.com/idiap/attention-sampling/blob/master/ats/data/base.py and you can see the FromTensors implementation of this API.

So basically if you want to implement openslide support (which would be awesome) the todo list would be like the following:

  1. Implement a MultiResolutionBatch that uses an openslide dataset
  2. Adapt the modules in ats.core.ats_layer such that they support multiple MultiResolutionBatch implementations
  3. Use the rest of the code as is.

Let me know if any of these makes sense and if you have any more questions I 'd be glad to answer them.

Angelos

DanielRobertNicoud commented 4 years ago

Hi Angelos,

sorry I didn't answer earlier, i have a bit of time now it's the weekend.

If I understand the structure correctly, I would need to modify the function patches in from_tensors to have it call the function patches from MultiResolutionBatch, then depending on the type of dataset we would need MultiResolutionBatchto either call the function that you have implemented or openslide or whatever else.

Do you agree or do you see a better way to do it?

angeloskath commented 4 years ago

Hmm I am not sure I understand what you mean.

If everything goes well you would not have to modify any existing file in the ats.data module. You would just implement the MultiResolutionBatch interface for an openslide backed dataset.

The FromTensors implementation uses predefined tensors to extract the patches. I would create an openslide implementation that would use paths to openslide images or possibly handles to openslide files.

Let me know if this makes more sense.

DanielRobertNicoud commented 4 years ago

Maybe I'm misunderstanding something, but my understanding of the current structure is as follows:

Now in an implementation where we want to do the same thing, I would just have FromTensors to call MultiResolutionBatch.patches and I would have MultiResolutionBatch.patches call extract_patches. But this requires to have x_high passed in as a tensor. My goal would be to pass x_high only as an openslide object and use the read_region function to extract just the patches, without having to read the whole high resolution image. I don't really understand how you propose to implement this above. If you want, we can have a skype call to discuss about it, maybe by voice it would be easier.

jbschiratti commented 3 years ago

I would also be interested in integrating OpenSlide with this method (the MultiResolutionBatch seems like a nice place to start). @DanielRobertNicoud did you implement it already?