Setting up training data

pakiessling commented 1 year ago

Thank you for the tool. I am excited to try it out with my own data in a weakly supervised fashion.

If I understand right I will need the centroids of the nuclei and binary segmentation of my images.

The nuclei position I can get with https://scikit-image.org/docs/stable/auto_examples/features_detection/plot_blob.html (Is Laplacian of Gaussian the best algorithm?)

Do you also have a suggestion for binary foreground / background segmentation?

Also do I need to resize to 512 x 512?

jiyuuchc commented 1 year ago

In our experience, the difference between LOG and DOG is very minor. DOG is a bit faster.

You can also try a pre-trained nuclei segmentation CNN (e.g. stardist).

As to foreground/background label, our current recommendation is to perform training without it. For most types of cells, our results showed no difference by adding this label. You can always fine-tune your model by adding this label afterwards if you found your model accuracy not good enough. see https://arxiv.org/abs/2304.10671

Follow the point-supervised demo notebook for coding set up.

Hope that helps.

Ji

From: pakiessling @.***> Sent: Friday, July 28, 2023 7:46 PM To: jiyuuchc/lacss Cc: Subscribed Subject: [jiyuuchc/lacss] Setting up training data (Issue #3)

Attention: This is an external email. Use caution responding, opening attachments or clicking on links.

Thank you for the tool. I am excited to try it out with my own data in a weakly supervised fashion.

If I understand right I will need the centroids of the nuclei and binary segmentation of my images.

The nuclei position I can get with https://scikit-image.org/docs/stable/auto_examples/features_detection/plot_blob.html https://urldefense.com/v3/__https://scikit-image.org/docs/stable/auto_examples/features_detection/plot_blob.html__;!!Cn_UX_p3!kjalNnb1pbHTdy2eAQzcOtJa5MpqE193_LCHetG5wNf1phKgDl6jbLdsfWMelJ7Hgpm32jBUfUp8O8DCdTjVBw$ (Is Laplacian of Gaussian the best algorithm?)

Do you also have a suggestion for binary foreground / background segmentation?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/jiyuuchc/lacss/issues/3__;!!Cn_UX_p3!kjalNnb1pbHTdy2eAQzcOtJa5MpqE193_LCHetG5wNf1phKgDl6jbLdsfWMelJ7Hgpm32jBUfUp8O8Dqdm67UQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKRPNTIBUX7ATS6MQNATKTXSRFMXANCNFSM6AAAAAA24DCQPA__;!!Cn_UX_p3!kjalNnb1pbHTdy2eAQzcOtJa5MpqE193_LCHetG5wNf1phKgDl6jbLdsfWMelJ7Hgpm32jBUfUp8O8A9Da-rxA$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

pakiessling commented 1 year ago

Perfect, thank you!

pakiessling commented 1 year ago

I hope it is alright if I ask another question @jiyuuchc. I see that you use gray scale 512 x 512 images for training in train_with_point_label, and your code crops / pads all images to that size.

In my case I have 2048 x 2048 images with the first channel being the membrane I want to segment and the second channel being the DAPI stain. Do I need to adjust my images prior to training / interference?

jiyuuchc commented 1 year ago

In theory, you can use any image format to train the model, so long as they are in the HxWxC format. However, depending on your GPU, you may run out of memory if your input image size is too large. In the latter case, you should incorporate a resize and/or crop operation in your data augmentation pipeline. You are free to choose a final crop size significantly smaller than the original image size.

Similarly, a very large image may also cause OOM during inference. In this case, the image needs to be divided into smaller patches, and the predictions be stitched together. The Predictor class has a function predict_on_large_image() to automate this process, see the API document (https://jiyuuchc.github.io/lacss/api/deploy/#lacss.deploy.Predictor.predict_on_large_image) for details.

The only constrain on channel configuration is consistency between training and inference data. For example, you can training with only the membrane channel and no DAPI channel, but then the inference image should also be one-channel. Vice versa, if you train with both membrane-channel and DAPI channel, then the inference data should also have both channels.

Ji

From: pakiessling @.> Sent: Sunday, July 30, 2023 4:15:35 AM To: jiyuuchc/lacss @.> Cc: Yu,Ji @.>; Mention @.> Subject: Re: [jiyuuchc/lacss] Setting up training data (Issue #3)

Attention: This is an external email. Use caution responding, opening attachments or clicking on links.

I hope it is alright if I ask another question @jiyuuchchttps://urldefense.com/v3/__https://github.com/jiyuuchc__;!!Cn_UX_p3!n_mH2zONq85bmzM1CKtLbfUPEbAqFhn9eMvuIvbVtI2jdX6DDNqpfkZwIAp65bZ280uGsWguFs9tk1bQ5eW6FA$. I see that you use gray scale 512 x 512 images for training in train_with_point_label, and your code crops / pads all images to that size.

In my case I have 2048 x 2048 images with the first channel being the membrane I want to segment and the second channel beign the DAPI stain. Do I need to adjust my images prior to training / interference?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/jiyuuchc/lacss/issues/3*issuecomment-1656870159__;Iw!!Cn_UX_p3!n_mH2zONq85bmzM1CKtLbfUPEbAqFhn9eMvuIvbVtI2jdX6DDNqpfkZwIAp65bZ280uGsWguFs9tk1aH4i_MPQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAKRPNUXUSH7ICO7EYZ2753XSVVOPANCNFSM6AAAAAA24DCQPA__;!!Cn_UX_p3!n_mH2zONq85bmzM1CKtLbfUPEbAqFhn9eMvuIvbVtI2jdX6DDNqpfkZwIAp65bZ280uGsWguFs9tk1a-99iY4A$. You are receiving this because you were mentioned.Message ID: @.***>

pakiessling commented 1 year ago

Thank you so much @jiyuuchc !

If it isnt too much of a bother could you take a quick look at: https://github.com/pakiessling/lacss-test/blob/main/try_tissuenet.ipynb

I am trying your tissuenet model on my image and it doesnt really work that well. Do you think this is because I formated the input wrong or because I didnt train the model on my data yet? I plan to train on 1000+ weakly annotated images.

Also I see that you are loading in masks even in the train_with_point_label.ipynb example. https://jiyuuchc.github.io/lacss/api/data/#lacss.data.generator.simple_generator

Should I just leave mask_file empty in my train.json ?

jiyuuchc commented 1 year ago

I'm not sure I understand the question here.

Your linked notebook shows the inference results using a stock/pretrained model (tissuenet). Because the domain shift from the original training data and your own data, the results are not very accurate. But that's expected, no?

I assume your goal is to train/re-train a new model that would work well for your data. Have you setup a notebook to do that yet?

As to the the train.json format, you are right. The original code was not written with point-label in mind. Current, you will have to supply a fake mask image, if you use only point label. This will be fixed in the next release.

A final note of importance: from your testing notebook, I see that your images were acquires at a very fine pixel resolution. I would strongly recommend adding a downscaling operation in your training data pipeline:

data = lacss.data.resize(data, target_size=(512,512))

This would allow much better use of transfer knowledge from the original model, because those were all trained with images of much coarse pixel sizes.

Hope this helps.

jiyuuchc / lacss

Setting up training data #3