rdinse / sea-lion-counter

37th place solution to the "NOAA Fisheries Steller Sea Lion Population Count" Kaggle Challenge
3 stars 1 forks source link

Training Images #1

Open willishf opened 7 years ago

willishf commented 7 years ago

Would like to test your implementation for cell counting. Can you provide example images used for training?

For the example input image you have a border that is a solid color and then an image buffer region for images on the edge. Is the solid border required? Can you provide dimensions in pixels of the various regions?

In looking through the requirements it has tensor_flow_gpu=1.2. Any concern with it working with the latest tensor_flow release? Impacts what version of CUDA to use etc which complicates the install.

Did the initial install using python 2.7 and ran main and said requires python 3.3

rdinse commented 7 years ago

Hello willishf.

The solid color in the border is not necessary; it could also simply show more of the image instead. It is a small optimization because the network is not supposed to count in that region anyway.

The dimensions are a bit of a nightmare. The architecture was way too complicated for the limited time we had for the competition, so some of the numbers are quick back-of-the-envelope calculations partly using calc_projective_field from utilities.py to compute the perceptive fields, plus some trial-and-error. Sorry about that!

I would advice you to only work with the inception_model.py (it should be the default model as defined in main.py). The contextual_inception_model.py adds a lot of complexity and some of that was tailored to capture relations between the animals in the sea lion dataset over large distances which is probably irrelevant to counting cells.

You can see the definitions of the dimensions here: https://github.com/rdinse/sea-lion-counter/blob/master/models/inception_model.py#L24

IIRC, output_size was basically manually copied from the output tensor shape (preds) and projective_field_size from the computation of the projective fields (it is required earlier than line 101 for the construction of the input graph).

Python 3 is required. You can try out TF 1.4. If there are errors, then you might need to change some things according to the migration info in the change log.

rdinse commented 7 years ago

For loading the cell data set, you basically need to write your own conversion script to the tfrecord format. You can see how this is done in this function.

If this is in the context of education/university, I can also send you my lab report for more details. Just send me a mail (robindinse at Gmail).

rdinse commented 7 years ago

Your project sounds really great.

Just to provide a bit more info about this code:

The key advantage of Countception is that it performs a form of bagging by redundantly counting the objects in a sliding window. That makes it suitable for small datasets and it also enhances the predictions. However a problem is that it cannot deal well with strided convolutions because it can make the sizes and positions of the projective fields vary by 1 px from one input pixel to the next, but for the final normalization step you need to divide by a constant surface area of the projective fields. For a visualization, try moving your mouse cursor across the the input layer in this web app while pressing the shift key and observe the projective field in the output layer at the top: http://rdinse.github.io/convnet-visualizer/convnet-visualizer.html#82~0,8,4,1,2,0,0~0,19,3,1,2,0,0~0,39,4,1,2,0,0~0,80,3,1,1,0,0

This problem can be circumvented by simply forcing all projective fields in the target images to have the same size. That introduces uncertainty because the mappings from pixel location to projective field are then inconsistently defined regarding translation, but in practice, that seems to average out pretty well if one adapts the projective fields in all four directions alternatingly (as done in our code).

We added a contextual region around the receptive fields of each output neuron (by adding a couple of layers) such that each neuron can fully perceive of objects that are near the boundaries of their respective receptive fields. We did not have time to test how much this actually helps.

Anyway, for an amazingly simple and maintainable code (and perhaps as baseline) it might also be worthwhile to look into the winning solution to the Sea Lion Counting Challenge: https://www.kaggle.com/outrunner/use-keras-to-count-sea-lions/notebook

It is just a VGG-16 with the last layer replaced by a FC-1024 followed by a FC-5 (for the five animal classes) which is then fine-tuned to predict the counts in 300x300 image patches as a regression task.

Hope this helps.