nasa / pretrained-microscopy-models

MIT License
79 stars 19 forks source link

Can this repo directly inference on unlabelled raw images? #14

Closed linjiangya closed 9 months ago

linjiangya commented 9 months ago

Dear authors,

I now have a set of images but do not have any annotated mask for now. I want to use your pre-trained segmentation model to do inference on the images.

  1. However, it seems like this repo only contains the model for the encoder, but does not contain the complete model for segmentation. Am I missing something?

  2. I tried to follow multiclass_segmentation_example.ipynb but it seems like we need a dataset in DATA_DIR = 'Super1/'. Could you also tell me where we can get these? (You have mentioned that 1/3 of the data is available in https://github.com/nasa/pretrained-microscopy-models/issues/11#issuecomment-1663883540)

Best,

JStuckner commented 9 months ago

I got your email as well and I'm reposting your questions in case others have a similar issue.

Questions

  1. Does this GitHub repo support segmentation on unlabelled raw images? It seems like the pre-trained checkpoints for the segmentation model have not been released yet, but only released a checkpoint for the encoder (classification). So I guess, at least one high-quality GT mask is required to train my own model and then perform segmentation with the trained model. Is this correct? (according to issue #3 here: https://github.com/nasa/pretrained-microscopy-models/issues/3). But wouldn't "several images" be too few for training?

  2. You have mentioned that we access 1/3 of the images of your dataset in (https://github.com/nasa/pretrained-microscopy-models/issues/11#issuecomment-1663883540). Could you tell me where I can access them and do they have ground truth segmentation masks for training the segmentation model? :)

Answers

  1. The repo can segment unlabelled raw images as long as you train a segmentation model on similar ground truth (GT) masks. The pre-trained checkpoints are for classification models and through transfer learning the classification pre-training makes the segmentation models better. I have not released the pre-trained checkpoints for segmentation models, but they can be easily recreated using the example data (see answer 2) or by using your own data. Unless the benchmark example data is extremely similar to your images (highly unlikely) you'll need to create several high quality GT masks. You don't have to use the full image. I usually use 512x512 or 384x384 crops of the image. The size just has to be divisible by 32 (2^5 for 5 layers). It's better to use different images for each crop and not use the same image for all the crops you label. Capturing the range of imaging and sample conditions is best. I would say you need a minimum of 4 images (3 for training and 1 for validation) for decent results. Sometimes I need as many as 12 for very complex cases. And if the microscopist uses a different microscope or sample then I'll need to add a couple more and retrain the model.

  2. You can find the example data here: https://github.com/nasa/pretrained-microscopy-models/tree/main/benchmark_segmentation_data. If the superalloy ("Super") images are quite similar to your images then you can use those for training your model in addition to any that you label. Whether you use my data or your own, or both, you'll need to put the images into 4 folders (train, train_annot, val, val_annot) as I have in the linked folder. If you use my Super images you can move all the data from test and test_annot and add it to the train folders. You can also combine the data from Super 4 into Super 1 if you wish (Super 2 and Super 3 are subsets of Super 1).