microsoft / GlobalMLBuildingFootprints

Worldwide building footprints derived from satellite imagery
Other
1.44k stars 210 forks source link

False positive buildings in the data set for Sweden - Gothenburg example with 6913 examples #94

Open stefankinell opened 9 months ago

stefankinell commented 9 months ago

In the dataset for Sweden I find so many false positives in the Microsoft data set so that it becomes difficult to use without manual inspection. As a test I ran a comparison on the data for the city of Gothenburg. I did a comparison of the Microsoft building with the official data from the city of the buildings. I added a buffer of 10 meters from the real buildings from the city data and kept buildings from the MLBuildings footprint that did not touch these buffers. The result left me with 6913 "buildings". Some of them are probably buildings, but the vast majority of them are false positives. I have ziped a .gpkg file with my false positives in the post as reference. But some examples are also presented with images below.

So what are they then?

It is a combination of things one can understand, like containers in the harbour. Cars parked on farms. Boats in the harbour.

image At: 57.692740,11.841055

image At: 57.6920724,11.8009287

But there are also quite strange things like bare rock by the ocean.

image At 57.7334580,11.7445517

image At 57.7428883,11.7393247

Cars on the road? image At: 57.8011746,11.9566407

Forrest: image At: 57.8025395,11.9673369

Running track: image At 57.6783985,11.9391418

gbg_false-positives.zip

stefankinell commented 9 months ago

And these are just examples from where I live. They were easy to show due to the accessibility to real data. I see the same in many other places in Sweden, as well as big square chunks of data just missing.

I see that data in other parts has been reprocessed with a confidence attribute added to them. Would that be possible to do also in Sweden soon?

andwoi commented 9 months ago

@stefankinell thanks for gathering these together. From this sample (and others) it appears our models systematically struggle in ports and other industrial areas. We've seen similar behavior for airports.

stefankinell commented 9 months ago

Which is quite understandable. What makes me more puzzled are the many "huts" that are found in the forest and on the oceanside cliffs.

stefankinell commented 2 months ago

I checked back to see if something has been updated since I posted this. It seems that the data for Sweden has not been touched since the first iteration. Is it correct to assume that it will not be processed again? I understand if our country is to small to be of interest, but it would be nice at least to know what the plans moving forward are.

andwoi commented 2 months ago

Our plan at the moment is to continue updating data as our imagery sources are updated.