Some questions about make own training data and train

ghost commented 5 years ago

Hi, hftsai

Thank you for your previous reply. I have some questions about make own training data and train.

I have tried to make my own dataset using imageJ. First, I used LOCI plugin to create cell roi map from raw cell image. Then I used "preprocess_data.py" to change the colored into gray scale 8 bit image. At last I had "raw.tif", "labeled.png", "instances_ids.png" to train and test, but I found that in your training folder there have "feature_0.png", 'feature_1.png'. What are the effects of these two pictures? Is it helpful for training? Because you don't seem to mention this part.
Do you use preprocess_data.py for the picture label? Because I found in line 35 and 36. Do you seem to gradually increase the gray scale level of different cell individuals? Because my cell count is very small at the beginning. This caused the mask of my cell image to be quite dark. Almost consistent with the color of the background. Will I change "id_counter" in line 31 can helpful to train?
My model accuracy was not good. Is it possible that I use "mask_rcnn_coco.h5" as a weight not good? , or I guessed it may be because my train has only 10 images and val has 5 images and I halved all epochs. Because I just want to try training before the cells grow. The process images of cell growth will be trained later. Do you have any good suggestions?

image_id_1 This is my raw image.

image_id_1 This is the predicted mask image.

I'm sorry I have a lot of questions.

best regards, et0704a

hftsai commented 5 years ago

Hi et0704

sorry, the feature_0 and feature_1 files were intended for the old version of the famous Deepcell from Stanford (now at Caltech). But to my knowledge, they are currently reworking the entire thing. You can ignore the feature_0 and feature 1. feature_0 is the boundary of cells (membrane), and feature_1 is the cytoplasm. I should delete these files as they are not directly related.

The Usiigaci segmentation with Mask R-CNN uses only raw.tif and labeled.png. (instances_ids.png can be generated by the preprocess script if the labeled.png is not 8 bit indexed)

the grayscale is for the instance aware detection, i.e., each cell has a particular ID. This is absolutely necessary for tracking (tracker recognize and links each id, i.e. by their grayscale value) If you want to see the cells clearly, you can load into ImageJ and adjust the lookup table or do the thresholding and then the segmentation results will be more visualized with black and white after thresholding.
I highly recommend against training from scratch. In my experience with desktop GPU, it never works with these small data set for the large network size of ResNet. It will convolute if you use the pretrained network (i started with the one trained already with microsoft COCO dataset). We started with datasets from 5, 10, 25, and then we got to 50. all our previous effort training with training datasets less than 50 gave bad results. So it definitely follows more training data more accurate rule of thumb. I will recommend you to spend a couple days to increase the training data if you want to get acceptable results.

In preference we want to increase our training data to 1000s spanning across different magnification, different optical configuration, and different cell types & substrates. But unfortunately i'm spread a bit thin now so it will have be dealt with in later time.

I think your results aren't really bad (note that Mask RCNN segmentation is not perfect and sometimes the boundary do tend to be smaller than the real cells). are these stem cells? or if it's cell lines, may you'd prefer to further coat the substrate with coatings so they stick better. Anyway these are very different with the training data i use (my cells attach well on substrate and protrude lamellipodia) so in this case, you do need to increase training dataset for this kind of cell shapes.

hftsai commented 5 years ago

Hope this will help you. Please do let me know how it goes.

ghost commented 5 years ago

Thank you for providing the requested information. I will spend more time to increase the training data. In addition, in "train.py" line 217. What is the weight file of your use? Another problem is the part of the cell tracking. This is the result of tracking with your T98_sample. How do you keep track of the cells of No. 49? Even if he has split into No.49 and No.88, instead of becoming a new tracking result, like cell of No.88 and No.89, and how do you know that No.49 is on the top, No. 88 cells on the below ? This is a bit difficult for me to understand.

Finally, I would like to ask you. What do you think is the advantage of using Mask RCNN in cell tracking? Because some plugins on ImageJ can also track cell functions. I don't have domain knowledge about Biomedical engineering,but I am researching techniques for applying AI to cell detection. I found very few projects in the "continuous" tracking image during the search process. Most of them use CNN for the detection of single images. In your experience. I was wondering if you might be able to give me some advice.

Thank, et0704a

hftsai commented 5 years ago

HI,

First,

First, the weight file I started using is the weight pretrained for Microsoft COCO dataset. This is released also by Matterport. https://github.com/matterport/Mask_RCNN/releases or alternative you can start from either three of the weights in our repository.

https://www.dropbox.com/sh/3eldgvytfchm9gr/AAB6vzPaEf8buk81IRVNClUEa?dl=0

on the tracking So it is described more in detail in our paper but we haven't got it accepted it. (hopefully soon)

In short, each instance (a cell) in each image has a 8 bit value (ID). However, in a time lapse experiment, individual frame image is segmented independently, thus the ID of a same cell may not be the same in every image. The tracker handles this by starting from the first image, and tracking & link each id by nearest neighborhood searching through k-d tree algorithm as built-in in the Trackpy library. Our tracker is designed for our data. So you should be able to see a cells having only one ID throughout the time lapse in most cases. If it doesn't work in your dataset, then some tweaking in the tracker parameters is necessary (code only). or alternatively, if the frame rate is not high enough the error for nearest neighborhood search may fail, then it will be necessary to shorten the time lapse interval.

However, currently, we haven't build a lineage function in our tracker. That means for a cell underwent mitosis, splitting, you will have a no.49 splitting into no.49 and no.88. Currently this is necessary to be handled in the manual verification step by users. In the tracker, after tracking, you just check if these mitosis events happened and exclude the tracks if necessary. I particularly left these because we are still exploring solutions for these data and these raw data maybe useful for people who has their own lineage analysis code.

hope it helps. Let me find out from the publisher if we can upload a preprint.

Exactly! I have been working with cell tracking for many years.

If your interested in different software people developed over the years. I have organized a resource page for this (incomplete i'm sure, so if you catch something i missed, please let me know) https://sites.google.com/site/tsaihsiehfu/microfluidics-study-material/phase-contrast-cell-tracking

the most successful commercial solution may be the Metamorph (but it's kind of annoying to streamline the process of multiple data tracking). In metamorph, you load the movie. you still have to manually start tracking by clicking roughly the centroid of cells in the first image. Then later the software will automatically track cells. usually not too bad if you really restrict the parameters (at a cost of computation expense) The major drawback we stopped from using Metamorph, is one it's very expensive (although we use a facility own license), and it can only track the XY location of cells. not suitable for cell morphology tracking

Some ImageJ libraries such as MtrackJ, Trackmate 2 are really good for particle tracking. Although i have some communication with them for tracking cells, but i haven't get them to work as painlessly as i want. All these libraries, to my knowledge, also just track points. Even they do work easily, you will only get XY location (which may be require users to identify a starting point, which the biggest question is whether human eye registration is really close to the real centroid)

So even now, most of the Deeplearning, more specially CNN methods that people developed, they focus on the segmentation. But don't get me wrong. Segmentation is exactly the most important part, although not the only part for cell tracking. (however, if segmentation is bad, then tracking will be bad, especially if you cannot tell touching cells apart, then single cell tracking is out of the question) I know like Deepcell or people from cambridge, U Dakota have CNNs methods to segment and seems to have some of the tracking too. But I'm kind of dumb so it's hard for me to get them to work on my dataset, hence, we developed our own tracker.

I got too far so the main answer to your question: The advantages of Mask RCNN are the following:

Speed and computation cost: (in comparison to the old Deepcell if you used it) it's much faster and can be run on laptops will a intermediate graphics card. To my actual experience running deepcell, to get more reasonable workflow, you will most likely need a powerful workstation with Titan V GPU or better CPU clusters.
Mask RCNN is as fast as FCN and more importantly, it gives out the entire mask of cell components (in our current case only the cytoplasm). This is important for us. We want to track cells as well how its morphology changes. And we surveyed among the techniques and identified it to be more easily adaptable for us (non-machine learning amateurs)
Mask RCNN is instance-aware. i.e. each objects have a particular value, we called (ID). This makes following tracking more easily processed. Because segmentation is not everything. In fact several work before has achieved good segmentation of phase contrast microscopy before with very high dice and jaccard index. While very useful for calculating confluence, if you actually use it. it' very hard to use for single cell tracking. Any cells close to each other is indistinguishable. Segmentation is absolutely important, but for tracking, single-cell segmentation is a more stringent requirement.

Hope it helps.

hftsai commented 5 years ago

The preprint is now available at https://www.biorxiv.org/content/early/2019/01/18/524041

ghost commented 5 years ago

Hi, hftsai

Thank you from the bottom of my heart. I'm a master's student from Department of Mechanical Engineering, National United University, Taiwan. Many thanks for the favour you did for me and the kind interest you took in me. Please accept my grateful appreciation for you.

Cheers,

et0704a

hftsai commented 5 years ago

Sure. I understand the problems you may face, because i've been there.

regarding the software, please read the preprint for details and if you have more questions, please don't hesitate to send me an email.

oist / Usiigaci

Some questions about make own training data and train #7