Model overfitted or mapping challenge dataset not diverse? Poor performance with custom images

carbonox-infernox commented 6 years ago

Excellent results can be achieved using the Developers' fully-trained model (found here: https://app.neptune.ml/neptune-ml/Mapping-Challenge/files) to predict on the test_images set provided as part of the mapping challenge. However, over the past few days I have experimented with custom data from Bing imagery and Hexagon and have gotten very poor results. Here are some examples:

subdivision_10_18 subdivision_03_22 subdivision_04_02 subdivision_17_11 subdivision_03_03 (a few were good) subdivision_01_09

Where cyan circles denote mask centroids. I have experimented with several factors, such as:

image source (bing vs hexagon)
image quality (high vs low areas)
zoom level (pixels per meter given fixed 300x300 image size)
quality level due to conversion from png to jpg
overlapping tiles and keeping only masks found in both

And I've treated my custom data exactly as I treated test data. I used the same pipeline and settings for both. I'm sort of forced to conclude that either the model has overfitted to the "type" of data given by the open challenge, or said given data is not a diverse selection of image sources.

Has anyone else found good results with custom imagery? If so, what was the source?

jakubczakon commented 6 years ago

@carbonox-infernox good job analyzing the results on external data. I do think that the dataset provided by the organizers was not very diverse and was provided more as a proof of concept than anything else.

So in short, I think the model just learned what it was fed. A simple building dataset. That being said I think (talked about it with other participants), that you can finetune the model we have for just a few epochs on a new dataset successfully.

I'd suggest you try that.

51616 commented 5 years ago

@carbonox-infernox Can you guide us through how you load the model weights and get prediction on your own images? Along with code examples would make this very clear. :)

edit. ps1. Did you do any transformation to the inputs at all (e.g. normalize or resize)? If you do can you explain that? ps2. The model has 2 output channels. What should I do when doing inference?

apyskir commented 5 years ago

Hi @51616 Just download the model and place it in your experiment directory. In neptune.yaml you have to specify experiment_dir. Inside this directory you should create a transformers directory and place your unet file there. While inferencing the code will look for a file with model weights in that directory. Hope it helps. ps.2. It's prepared for more than one class, so the first channel refers to "background" class, and the other refers to "building" class. (I hope I remember this correctly)

Angel0003 commented 4 years ago

Can you guide us through how to train my own datasets? Along with preparing training data and code examples would make this very clear. :),how to generate the file of metadata.csv?

neptune-ai / open-solution-mapping-challenge

Model overfitted or mapping challenge dataset not diverse? Poor performance with custom images #196