How to use your training data?

gatoniel commented 2 years ago

Hey, I wanted to use your training dataset (http://www.cellpose.org/dataset_omnipose) for benchmarking purposes. However, I was somewhat surprised when I saw that some of your images are copied together, e.g. 037_img of the training data. I think it is problematic to feed these images straightforward into any algorithm as artificial artifacts are introduced. For example the objects in the 3rd image of the 1st row and the 2nd image in the 2nd row end arbitrarily due to the edges of the "image". Do you also provide a script with which these images can be separated again? Best, Niklas

kevinjohncutler commented 2 years ago

@gatoniel, sorry for the late reply. I worried about artifacts as well, but in fact we want our network to be robust against such artifacts. For one, we want the network to be fine with seeing cells cut off at the image edge. Secondly, these flat edges introduce a sort of 'cell morphology' that is not seen in nature, but we still desire an algorithm that doesn't care if cell edges are round or flat or if cells have sharp corners. The results presented by my paper indicate that these artifacts pose no problem at all to overall segmentation performance.

The reason I made these 'ensemble images' for training was to increase cell density so that the network would see more cells in each batch rather than mostly training to ignore a bunch of empty media. Another reason I used these ensemble images is to speed up annotation. Having denser images just means fewer files to go through for making ground truth.

In come cases, I did use a script to make these ensemble images based on isolating microcolonies with omnipose (foreground/background only), so it might be possible to reverse engineer that. In other cases, like the H. pylori that you are referencing in 037, I had to start out by manually cropping the images to select only those cells that were suitable for 2D segmentation (specifically, removing all cell overlaps).

gatoniel commented 2 years ago

Hey, thank you for these explanations. I tried your dataset with other algorithm and, indeed, it looks like they get robust against the artifacts. Now, I think this is actually a good idea!

kevinjohncutler / omnipose

How to use your training data? #1