Creating Training Data - Githubissues

delano-j commented 4 years ago

Hello, this looks like a great tool. I am hoping to use it with DIC images as opposed to PCM images. Based on preliminary results, I will have to either preprocess images, or create new training data. In your ReadMe you mentioned how to use ROI manager and LOCI, using a Wacom tablet or iPad. How were you able to use an iPad/Apple Pencil to create ROI maps? That seems to be a good technique.

Alternatively, do you know if DIC images have been used before, and if there were any techniques for using them?

Thanks!

hftsai commented 4 years ago

Hi Delano, sorry for the late reply.

Yes we have used ipad/apple pencil to remote login to a windows desktop via teamviewer and try to do it. but it does seem to be a bit too much work. but if you prefer, i think it's doable. (but of course, a wacom tablet is cheaper than ipad)

I don't know any ipad native app can do this but if you know please let us know.

I know DIC images have been used by other groups and upon retraining it works. We also are testing some images provided by collaborating teams with DIC and they seem to work. we also have some annotated DIC images but we just haven't got to them.

I do know there are some other softwares working with DIC images segmentation. But to my knowledge, good segmentation is a prerequisite, but good segmentation doesn't necessarily mean tracking works. so you'll have to try to find out if the pipeline works for you.

you can take a look for information on cell segmentation/tracking software we've surveyed. a webpage from my personal site

delano-j commented 4 years ago

Thanks for getting back to me! Your website looks very helpful, I'll be sure to explore it.

I went ahead and created a 50 image training set of our DIC images. I ended up using ImageJ with the LOCI plugin on my iPad through Apple's Sidecar, and it worked well.

I trained the network on an Nvidia P100 GPU starting with the MS-COCO weights and actually saw good results. I was wondering though if you could help with some of the training configurations to make as good a model as possible. My input images look like this (converted from a .TIF to a .JPG for GitHub): sample_DIC_image

My current training configuration is below. Do you have any suggestions? Thank you so much in advance.

class cellConfig(Config):
    NAME = "cell"

    # Adjust to GPU memory
    IMAGES_PER_GPU = 1

    NUM_CLASSES = 2  # Background + cell

    STEPS_PER_EPOCH = 45
    VALIDATION_STEPS = 5

    # Don't exclude based on confidence. Since we have two classes
    # then 0.5 is the minimum anyway as it picks between nucleus and BG
    DETECTION_MIN_CONFIDENCE = 0.5
    #DETECTION_MIN_CONFIDENCE = 0.5

    # Backbone network architecture
    # Supported values are: resnet50, resnet101
    #BACKBONE = "resnet50"
    BACKBONE = "resnet101"

    # Input image resizing
    # Random crops of size 512x512
    IMAGE_RESIZE_MODE = "crop"
    IMAGE_MIN_DIM = 512
    IMAGE_MAX_DIM = 512
    IMAGE_MIN_SCALE = 2.0

    # Length of square anchor side in pixels
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)

    # ROIs kept after non-maximum supression (training and inference)
    POST_NMS_ROIS_TRAINING = 1000
    POST_NMS_ROIS_INFERENCE = 2000

    # Non-max suppression threshold to filter RPN proposals.
    # You can increase this during training to generate more propsals.
    #RPN_NMS_THRESHOLD = 0.9
    RPN_NMS_THRESHOLD=0.99

    # How many anchors per image to use for RPN training
    #RPN_TRAIN_ANCHORS_PER_IMAGE = 64
    RPN_TRAIN_ANCHORS_PER_IMAGE = 128

    # Image mean (RGB)
    #MEAN_PIXEL = np.array([43.53, 39.56, 48.22])
    MEAN_PIXEL = np.array([5,5,5])
    # If enabled, resizes instance masks to a smaller size to reduce
    # memory load. Recommended when using high-resolution images.
    USE_MINI_MASK = True
    #MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask
    MINI_MASK_SHAPE = (100,100)

    # Number of ROIs per image to feed to classifier/mask heads
    # The Mask RCNN paper uses 512 but often the RPN doesn't generate
    # enough positive proposals to fill this and keep a positive:negative
    # ratio of 1:3. You can increase the number of proposals by adjusting
    # the RPN NMS threshold.
    #TRAIN_ROIS_PER_IMAGE = 128
    TRAIN_ROIS_PER_IMAGE = 150

    # Maximum number of ground truth instances to use in one image
    #MAX_GT_INSTANCES = 200
    MAX_GT_INSTANCES = 128
    # Max number of final detections per image
    #DETECTION_MAX_INSTANCES = 400
    DETECTION_MAX_INSTANCES = 150

hftsai commented 4 years ago

Great to hear that sidecar works.

The image looks like 20X or more ? I think it's definitely doable do you have the results image? I think you can work on changing the anchors and mean pixel. for actual inferring the miou parameters can also be tweaked abit to improve the results. small bits here and there to slightly increase the quality of the results.

good luck.

let me know if everything from segmentation to tracking works for you . Thanks

delano-j commented 4 years ago

The image is 20X yes. Here are the original and resulting images side by side: sample_DIC_image sample_DIC_result As you can see it can do pretty well, but there are still some issues. I'm also not sure if the problem is the training data we created or the training parameters. I took the average pixel value across all images in the training set to change the mean pixel parameter, but I don't doubt there's a better way to do it. Also, how should I go about changing the anchors?

Here are some other pictures for examples: sample2_DIC_image sample2_DIC_result

Thanks so much!

hftsai commented 4 years ago

Hi this looks quite good. but I do understand your concern that it's not good enough.

When we increased the training sample from 50 to more , it helps. I would suspect DIC will need more, because the membrane boundary is less obvious than phase contrast.

the current published code has some problem.

with mask rcnn it always looks like the segmented masks become slightly smaller than the actual object. I personally think it's the artifact from ROIalign and upsampling. but it's a limitation of the original matterport repo. The same limitation also appears as you might observe that some objects gets recognized in one frame, but might not in the next.

We have a new implementation that seems to resolve it partially but it's not ready to be published yet.

current segmentation is limited to 8 bit, so not so many cells can be recognized otherwise any cell more than 255 will just bear the id of 255. (this will cause tracking to fail). new implementation also resolved this.
We also found that instance segmentation of mask rcnn makes cell tracking much easier task, but not too accurate for cell surface area calculation as you may already observed. in these case, the simple FCN or U-net is better for this purpose.

in short, I think you can try training or inference parameters to see if you get better results. (unfortunately the things we learned are also by trial and error mostly) or a more straightforward approach is to include more training data from your imaging setting, that will definitely help the neural network to work better and more robust on your data.

hftsai commented 4 years ago

Thank you again for sharing your results!

delano-j commented 4 years ago

Thanks for your thoughts, this has been helpful. I understand how more training data would be useful. Hopefully we will not need more than 255 cell IDs, but it's good to be aware of.

I plan to track these cells next, specifically to keep track of lineages and cell division/death events. I will still take a look at Usiigachi's tracker, even though I think I'll need to use Lineage Mapper in the long run.

Finally, do you have any recommendations for changed anchor parameters? I'm inexperienced with the effect it would have on training. Thanks!

hftsai commented 4 years ago

you can give the tracker a try. but if you cells move faster, you might need to tweak the parameters a bit.

We've also tried Lineage Mapper. but had some problems validating and verifying results so we went forward to develop our own tracker. But i remember the segmented results are ready to be analyzed by lineage mapper. if you also like to extract single cell migration statistics you can also look into data analysis script, it also has a data interface for lineage mapper results.

if i remember correctly, for anchor parameters increasing a bit, helps iding smaller objects, but not by much.

good luck

delano-j commented 4 years ago

Great, thanks again this has been very helpful!

oist / Usiigaci

Creating Training Data #12