Add data chopping capabilities to boost performance

TibbersHao commented 9 months ago

Peter mentioned to me about one of the key performance boosters he has seen is the use of chopping up the training data into smaller overlapping patches, while taking care of non using non-annotated images.

It also ensures that the compute is used efficiently - you don't want to convolute images or parts of images without any labels.

A similar thing is true for inference: by having overlapping segments, you reduce edge effects and perform an additional averaging of results.

All of this could be done using qlty package outside of dlsia, it a good feature upgrade but I want to make sure whether it falls into the scope of the Diamond trip due to the time constrain we have.

taxe10 commented 9 months ago

To sum up our discussion today:

[x] As @phzwart suggested, TiledDataset should be updated to crop each input frame using qlty for both training and inference processes. To note: Since TiledDataset retrieves 1 frame at the time, we would not be shuffling patches across frames. This would need to be refactored after the experiment at Diamond
[x] Similarly, the inference patches will be stitched together using qlty

In parallel, we would like to start benchmarking some of the models within this implementation. For that, we agreed to proceed as follows:

[x] Add sample data sets to the public tiled server, e.g. Diamond data set
[x] Add sliding window augmentation to the ML algorithms as described at the beginning of this issue
[ ] Use the segmentation app in Spin to label the sample data sets and store these masks in tiled
[ ] Run training processes with the DLSIA networks available in this implementation and a variety of hyperparameters
[ ] Summarize the performance of these trained models and store the best trained models. These best performing models can be later used during the experiment at Diamond

Please feel free to add comments as needed @TibbersHao @Wiebke @xiaoyachong @zhuowenzhao @dylanmcreynolds @phzwart

TibbersHao commented 9 months ago

Thanks for the summary @taxe10 !

Working on this as my highest priority for now.

phzwart commented 9 months ago

Let me know if you need help.

The qlty task is essentially building a simple wrapper - most of it can be abstracted from the notebook i send. Make sure you provide access to parameters like window size and step size.

P

On Tue, Feb 27, 2024 at 6:19 PM TibbersHao @.***> wrote:

Thanks for the summary @taxe10 https://github.com/taxe10 !

Working on this as my highest priority for now.

— Reply to this email directly, view it on GitHub https://github.com/mlexchange/mlex_dlsia_segmentation_prototype/issues/6#issuecomment-1968067188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADWIEE56IVDBD5752OMWWZDYV2HZTAVCNFSM6AAAAABDTS2AUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRYGA3DOMJYHA . You are receiving this because you were mentioned.Message ID: @.*** com>

--

Peter Zwart Staff Scientist, Molecular Biophysics and Integrated Bioimaging Berkeley Synchrotron Infrared Structural Biology Biosciences Lead, Center for Advanced Mathematics for Energy Research Applications Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246

TibbersHao commented 9 months ago

The unstitched patches from qlty is currently in 4-d for a single slice, this will cause a dimension out-of-bound issue with PyTorch's default_collate function when building dataloader.

Cause of the problem: the default collate function uses np.stack which will introduce another dimension of the batch size as the axis, and this is intended in the PyTorch's documentation. Reference

Solution: This issue could be lifted by building a customized collate function and call it while building the DataLoader, which appears to be the recommended way from the documentation. To specify: use np.concat instead of np.stack to prevent the additional axis and the numbers will be figured out magically along the way.

This will be reflected in an upcoming PR.

mlexchange / mlex_dlsia_segmentation_prototype

Add data chopping capabilities to boost performance #6

--