mlexchange / mlex_dlsia_segmentation_prototype

Other
3 stars 3 forks source link

Add data chopping capabilities to boost performance #6

Closed TibbersHao closed 6 months ago

TibbersHao commented 6 months ago

Peter mentioned to me about one of the key performance boosters he has seen is the use of chopping up the training data into smaller overlapping patches, while taking care of non using non-annotated images.

It also ensures that the compute is used efficiently - you don't want to convolute images or parts of images without any labels.

A similar thing is true for inference: by having overlapping segments, you reduce edge effects and perform an additional averaging of results.

All of this could be done using qlty package outside of dlsia, it a good feature upgrade but I want to make sure whether it falls into the scope of the Diamond trip due to the time constrain we have.

taxe10 commented 6 months ago

To sum up our discussion today:

In parallel, we would like to start benchmarking some of the models within this implementation. For that, we agreed to proceed as follows:

Please feel free to add comments as needed @TibbersHao @Wiebke @xiaoyachong @zhuowenzhao @dylanmcreynolds @phzwart

TibbersHao commented 6 months ago

Thanks for the summary @taxe10 !

Working on this as my highest priority for now.

phzwart commented 6 months ago

Let me know if you need help.

The qlty task is essentially building a simple wrapper - most of it can be abstracted from the notebook i send. Make sure you provide access to parameters like window size and step size.

P

On Tue, Feb 27, 2024 at 6:19 PM TibbersHao @.***> wrote:

Thanks for the summary @taxe10 https://github.com/taxe10 !

Working on this as my highest priority for now.

— Reply to this email directly, view it on GitHub https://github.com/mlexchange/mlex_dlsia_segmentation_prototype/issues/6#issuecomment-1968067188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADWIEE56IVDBD5752OMWWZDYV2HZTAVCNFSM6AAAAABDTS2AUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRYGA3DOMJYHA . You are receiving this because you were mentioned.Message ID: @.*** com>

--

Peter Zwart Staff Scientist, Molecular Biophysics and Integrated Bioimaging Berkeley Synchrotron Infrared Structural Biology Biosciences Lead, Center for Advanced Mathematics for Energy Research Applications Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246

TibbersHao commented 6 months ago

The unstitched patches from qlty is currently in 4-d for a single slice, this will cause a dimension out-of-bound issue with PyTorch's default_collate function when building dataloader.

Cause of the problem: the default collate function uses np.stack which will introduce another dimension of the batch size as the axis, and this is intended in the PyTorch's documentation. Reference

Solution: This issue could be lifted by building a customized collate function and call it while building the DataLoader, which appears to be the recommended way from the documentation. To specify: use np.concat instead of np.stack to prevent the additional axis and the numbers will be figured out magically along the way.

This will be reflected in an upcoming PR.