gravitystorm / openstreetmap-carto

A general-purpose OpenStreetMap mapnik style, in CartoCSS
Other
1.55k stars 822 forks source link

Very small administrative boundary borders should start at a later zoom level #3678

Open jeisenbe opened 5 years ago

jeisenbe commented 5 years ago

Expected behavior:

Boundary=administrative border lines, like the outlines for protected areas, nature reserves, zoos and theme parks, should not be shown when the area is too small to see clearly.

Actual behavior:

In some Asia cities, the admin_level=10 borders enclose single blocks. This leads to non-intuitive rendering at z13 to z14 in Japan, and as late as z16 in Jakarta - near the equator.

Links and screenshots illustrating the problem:

Jakarta z13 https://www.openstreetmap.org/#map=13/-6.1245/106.9193 z13-jakarta-busy-borders

z14 z14-jakarta-busy-borders

z15 z15-jakarta-busy-borders

z16 z16-jakarta-busy-borders

z16 Inspector view (from my PR, showing polygons and way_pixels from admin-text layer): z16-jakarta-borders-inspector-way_pixels

Suggested solution:

jeisenbe commented 5 years ago

I also considered trying to change the tagging standards in Indonesia by introducing admin_level 11 and 12.

However, this would still be a problem in some other countries, such as Japan, because admin_level 10 is almost always the highest level, and it can very greatly in size so it should still be rendered at z14 at least - admin_level=10 is a village-sized area in Northern Ireland, for example.

Also, there may be a plan to render admin level 11 and 12 in the future, which would reintroduce this issue even if these were not rendered until z15 (which might be too late for some regions).

jeisenbe commented 5 years ago

The admin_level=10 boundaries in Japan are also quite small in the center of towns:

https://www.openstreetmap.org/#map=13/34.2266/133.7812 z13 northern Shikoku, Japan z13-japan-border-labels-busy z14 z14-japan-border-labels-busy z15 z15-japan-border-labels-busy

kocio-pl commented 5 years ago

It seems like z13 and z14 are really bad in Jakarta and I like to get it fixed (Japan does not seem that bad for me, however this could be made better too). However since this problem is pretty limited, what about simply moving l10 down to z15+?

Currently we render admin levels from the zoom levels like this:

admin level initial zoom level
1 -
2 4
3 4
4 4
5 11
6 11
7 12
8 12
9 13
10 13
11 -
12 -

So the levels are rather hand crafted and it would not break a system. With level 10 rendered on z15+ instead of z13+ we make a small change and it still gives us a room for rendering level 11 and 12 (for example on z16+).

jeisenbe commented 5 years ago

I considered that option, but I do not know if it will work in all countries where admin_level=10 is used. There may be places where this level is actually quite large on zoom 13 and 14.

In Japan, while some on the admin_level=2 areas are only the size of a block, in town or city centers, those in rural areas can take up much of the screen at z14.

Is there a source where the size diatribution of admin boundaries has been calculated? I’m not really able to download the global admin_level 10 data for testing On Tue, Feb 12, 2019 at 10:20 AM kocio-pl notifications@github.com wrote:

It seems like z13 and z14 are really bad in Jakarta and I like to get it fixed (Japan does not seem that bad for me, however this could be made better too). However since this problem is pretty limited, what about simply moving l10 down to z15+?

Currently we render admin levels from the zoom levels like this: admin level initial zoom level 1 - 2 4 3 4 4 4 5 11 6 11 7 12 8 12 9 13 10 13 11 - 12 -

So the levels are rather hand crafted and it would not break a system. With level 10 rendered on z15+ instead of z13+ we make a small change and it still gives us a room for rendering level 11 and 12 (for example on z16+).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gravitystorm/openstreetmap-carto/issues/3678#issuecomment-462571275, or mute the thread https://github.com/notifications/unsubscribe-auth/AoxshBkc1a0s2RV8f9MqXdKi-DyoKlghks5vMhbggaJpZM4ayJN4 .

kocio-pl commented 5 years ago

I made a file recently including all the admin levels inside using this:

time osmium tags-filter -o planet-admin.osm.pbf planet-latest.osm.pbf admin_level --overwrite

This file is ~710 MB big, are you interested in getting it?

There is also very handy service for analyzing and downloading different admin boundaries:

https://wambachers-osm.website/boundaries/

kocio-pl commented 5 years ago

Filtering only level 10 gives me 166 MB file, which I can also try to share if you want.

jeisenbe commented 5 years ago

Thanks! But I’ll have to wait a week till I’m in a place with free broadband.

160mb is still bigger than anything I’ve been able to import with docker or open with JOSM

Is there a way to use Osmium or Osmosis to only extract admin_level=10 borders larger than a certain area? Or can we export a spreadsheet showing the size or length of each polygon?

That would be even better.

(We should also do this for admin_level 11 and 12 before considering rendering these.)

On Tue, Feb 12, 2019 at 10:16 PM kocio-pl notifications@github.com wrote:

I made a file recently including all the admin levels inside using this:

time osmium tags-filter -o planet-admin.osm.pbf planet-latest.osm.pbf admin_level --overwrite

This file is ~710 MB big, are you interested in getting it?

There is also very handy service for analyzing and downloading different admin boundaries:

https://wambachers-osm.website/boundaries/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gravitystorm/openstreetmap-carto/issues/3678#issuecomment-462754908, or mute the thread https://github.com/notifications/unsubscribe-auth/AoxshJp6mr3Mlm95zH9FRNEgqMlVUrdTks5vMr7JgaJpZM4ayJN4 .

kocio-pl commented 5 years ago

I have opened the question here:

https://help.openstreetmap.org/questions/67990/admin_level-data-analysis

wambacher commented 5 years ago

Hi,

Daniel (kocio-pl) contacted me and asked for help.

yes, i can deliver the data/stats you are looking for. 500 k admin boundaries with id,country,level,name, size in square meters.

it's just a simple SQL query. very easy for me.

a snapshot of 10 boundaries: id | country | level | name | qm
---------+---------+-------+---------------------+------------------ 365496 | USA | 8 | Barnstead | 114254560.69487 7634174 | POL | 8 | Kurkowo | 896381.218400478 6195196 | AUS | 10 | Lambina | 3766828199.34477 7495967 | BLR | 10 | Слобода | 252888.531055674 7470814 | BLR | 10 | Слобода-Сосновка | 105735.395707846 5684846 | BRA | 9 | Araçoiaba da Serra | 259492380.209241 182962 | USA | 8 | Batesville | 352245.959392071 5541461 | AUS | 10 | Lansdowne | 1407443.23803035 5683160 | BRA | 9 | Marabá Paulista | 915807416.882723 2738270 | DZA | 8 | Djemaa Ouled Cheikh | 135870755.92275 (10 Zeilen)

Full list is currently 498k

Creating a csv or json, sorting or other things as you like.

e.g list of used AL with count per country.

please tell me details.

walter aka wambacher/germany

wambacher commented 5 years ago

Nothing for you? can't believe that.

kocio-pl commented 5 years ago

Could you just export simple CSV file for now as in the example? If @jeisenbe would like something more specific, it can be done later.

kocio-pl commented 5 years ago

BTW: could you post an SQL query too, so it can be used for similar purposes by other people?

jeisenbe commented 5 years ago

yes, i can deliver the data/stats you are looking for. 500 k admin boundaries with id,country,level,name, size in square meters.

walter aka wambacher/germany

@wambacher - could you post this file as a zip archive on this comment thread? If not, can you send it to me directly? I would very much appreciate this data. I'm sorry that I didn't respond sooner; I was busy traveling.

matkoniecz commented 5 years ago

In some Asia cities, the admin_level=10 borders enclose single blocks

Just to confirm - this is a correct tagging, right?

jeisenbe commented 5 years ago

Just to confirm - this is a correct tagging, right?

Yes, based on the current wiki these administrative areas should be admin_level=10 in Indonesia and Japan at least.

(It would be helpful to change these to 11 or 12 in Indonesia, but it would be difficult to get community consensus, and in Japan it looks like admin_level=10 is the most reasonable choice for these areas.) On Sat, Feb 23, 2019 at 8:48 PM Mateusz Konieczny notifications@github.com wrote:

In some Asia cities, the admin_level=10 borders enclose single blocks

Just to confirm - this is a correct tagging, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gravitystorm/openstreetmap-carto/issues/3678#issuecomment-466651147, or mute the thread https://github.com/notifications/unsubscribe-auth/AoxshFjIxBO5rfrhKEO2iOtP4sz9jscfks5vQUbCgaJpZM4ayJN4 .

matkoniecz commented 5 years ago

OK, now it would be useful to check distribution of sizes of admin level 10 areas.

Are almost all of them so small? Are there also larger ones?

Filtering by area will result with very bad results for places where some areas are just below edge and some just above the edge - with some borders mysteriously missing.

wambacher commented 5 years ago

@wambacher - could you post this file as a zip archive on this comment thread? If not, can you send it to me directly?

Zip is easy, but which contents format? csv? json? pg_dump? ...

walter

wambacher commented 5 years ago

download csv: https://wambachers-osm.website/images/osm/data/boundaries.csv

have fun

walter

matkoniecz commented 5 years ago

500 k admin boundaries with id,country,level,name, size in square meters.

Thank you very much! Is it including all admin boundaries areas or just ones mapped as relations?

matkoniecz commented 5 years ago

Here is result of my hacky (comments how my code may be improved are welcomed) R processing:

# for testing
# head boundaries.csv -n 100 > head.csv
setwd(".")
myvalues <- read.csv("boundaries.csv", sep=";")
print(myvalues)
print(myvalues$qm)
print(subset(myvalues, level==6)$qm)
data = subset(myvalues, level==6)$qm
print(sort(data))
print(head(sort(data)))
print(head(sort(data))[0])
hist(subset(myvalues, level==6)$qm, 10000, xlim=c(0, 10000000000))
png("myplot.png", width=1500, height=1500, units="px", res=300)
hist(subset(myvalues, level>=10)$qm, 200000, xlim=c(0, 20000000), xlab="admin area size [m^2]", ylab="count of admin areas (maybe only relations are counted?", main="")
dev.off()

myplot

Basic result: we have a power distribution, there are many very small areas but also some larger ones. Either we keep places with small areas unhappy or we will start displaying some large admin 10+ areas too late.

Filtering by area will result with very bad results for places where some areas are just below edge and some just above the edge - with some borders mysteriously missing.

matkoniecz commented 5 years ago

And as a sanity check: largest areas with admin level 10

            id country                                  name level           qm
8244   6386457     AUS Anangu Pitjantjatjara Yankunytjatjara    10 102185599221
8264   6206040     AUS                            Anna Creek    10  24018964033
8321   5737709     AUS                                Arumpo    10   5333492647
8468   5738770     AUS                             Balranald    10   5203889175
8759   3114253     AUS                            Big Desert    10   5490959281
9143   5731495     AUS                           Broken Hill    10  13009670142
9161   5731372     AUS                        Broughams Gate    10   6836669820
9746   6204050     AUS                 Clifton Hills Station    10  16741720392
9819   5762805     AUS                          Collarenebri    10   5044375099
9857   6195261     AUS                     Commonwealth Hill    10  10604799975
9974   6184804     AUS                        Cordillo Downs    10   8040946167
10597  5734234     AUS                              Enngonia    10   6039091938
10610  6194007     AUS                                Eringa    10   6977429324
10630  6205778     AUS                              Etadunna    10   6089028427
10868  6189024     AUS                           Frome Downs    10   6195537860
11628  6184789     AUS                            Innamincka    10  13741891019
11662  5734062     AUS                               Ivanhoe    10  15788073819
11755  6235175     AUS                            Kalamurina    10   6764228497
12133  6205886     AUS                             Lake Eyre    10  14141517855
12155  6243794     AUS                          Lake Torrens    10   5688100822
12303  6184275     AUS                                Lindon    10   6727438980
12387  5736009     AUS                                 Louth    10   6536132494
12476  6206783     AUS                               Macumba    10  11236360491
12729  5731715     AUS                              Menindee    10  13319140095
12833  5730807     AUS                            Milparinka    10   9154646864
13198  6195264     AUS                            Mulgathing    10   6047135024
13206  6205763     AUS                                 Mulka    10   5709538055
13243  6204069     AUS                            Mungeranie    10   5536481388
13263  6239271     AUS                            Murnpeowie    10   8683892330
13278  3114256     AUS                         Murray-Sunset    10   5914721924
13300  5731382     AUS                            Mutawintji    10   6299720482
13499  6208928     AUS                      Nilpinna Station    10   5669455674
13616  6195665     AUS                             Nullarbor    10  28665596979
13769  5730792     AUS                            Packsaddle    10   9708619197
13812  6184883     AUS                         Pandie Pandie    10   6507666143
14049  5733526     AUS                             Pooncarie    10   5227592631
14189  6184235     AUS                            Quinyambie    10  12129829396
14659  6193643     AUS                        Simpson Desert    10  36063466540
14914  6185088     AUS                     Strzelecki Desert    10   6622801716
14916  6239426     AUS                         Stuarts Creek    10   8552734880
15224  5729634     AUS                            Tibooburra    10  12208981559
15229  6194023     AUS                                Tieyon    10   6530686881
15236  5731021     AUS                                 Tilpa    10   6905762557
15259  6206824     AUS                             Todmorden    10   7251890947
15600  5760780     AUS                               Walgett    10   6566330110
15650  5730812     AUS                             Wanaaring    10  19838185396
15927  5730791     AUS                          White Cliffs    10  11676851478
15967  5731022     AUS                             Wilcannia    10  19840892359
15976  6209150     AUS                               Wilgena    10   8202031983
16089  6193918     AUS                               Witjira    10   7688947582
16364  6198921     AUS                            Yellabinna    10  26031351819
160186 6062901     FRA                              Fakarava    10   5285055119
393181 8366032     PRY                   Lagerenza 4 de Mayo    10  10439483587
393256 8365985     PRY                       Zona Agua Dulce    10  13085917127
393265 8368049     PRY                          Zona Florida    10   5082271387
393266 8372974     PRY                    Zona General Garay    10   5826538921
393270 8372975     PRY                        Zona La Patria    10  17811901546
393284 8368400     PRY                    Zona Teniente Pico    10   5045493959
393288 8368050     PRY                    Zona TTE. Martinez    10   5326095673
494693 6581994     ZAF                  Beaufort West Ward 2    10   8275211688
494698 6581999     ZAF                  Beaufort West Ward 7    10   8178713995
494850 3667848     ZAF                 Bushbuckridge Ward 34    10   7879316356
494862 3579893     ZAF                       Camdeboo Ward 7    10   6976102963
495027 3579027     ZAF                     Dikgatlong Ward 6    10   5725424103
495416 3578383     ZAF                     Emthanjeni Ward 6    10   6762669794
495417 3578384     ZAF                     Emthanjeni Ward 7    10   6566656355
495589 3593560     ZAF                         Gariep Ward 3    10   6742977114
495831 3578235     ZAF                         Hantam Ward 3    10  13973697998
495832 3578236     ZAF                         Hantam Ward 4    10  12018705366
495833 3578237     ZAF                         Hantam Ward 5    10  10096943826
495983 3592610     ZAF                Inxuba Yethemba Ward 6    10   7640743238
495996 3579129     ZAF                   Joe Morolong Ward 4    10  10684164784
496153 3668399     ZAF               Kagisano-Molopo Ward 12    10   5137644840
496157 3668403     ZAF                Kagisano-Molopo Ward 2    10   5498035555
496171 3578838     ZAF                     Kai !Garib Ward 7    10   5057490855
496173 3578840     ZAF                     Kai !Garib Ward 9    10  17497190269
496177 3578225     ZAF                     Kamiesberg Ward 4    10   5978293559
496184 3578691     ZAF                      Kareeberg Ward 3    10   8367738212
496185 3578692     ZAF                      Kareeberg Ward 4    10   9277388345
496188 3578254     ZAF                 Karoo Hoogland Ward 3    10  29837120612
496203 3578269     ZAF                        Khâi-Ma Ward 4    10  13455413548
496206 3578871     ZAF                   ǁKhara Hais Ward 11    10  19619264387
496218 3578886     ZAF                         !Kheis Ward 1    10   9747623732
496329 6581938     ZAF                     Laingsburg Ward 2    10   8550197598
496399 3663249     ZAF                      Lephalale Ward 3    10   8987923143
496890 6582390     ZAF                      Matzikama Ward 8    10   8282624657
497098 3578810     ZAF                           Mier Ward 2    10   5107786281
497100 3578812     ZAF                           Mier Ward 4    10  12543746825
497461 3657679     ZAF                         Musina Ward 2    10   6276051178
497504 3578171     ZAF                      Nama Khoi Ward 1    10   7870143724
498037 3578143     ZAF                   Richtersveld Ward 1    10   5410014822
498140 3578758     ZAF                      Siyancuma Ward 1    10   7489432353
498141 3578759     ZAF                      Siyancuma Ward 2    10   5259718201
498149 3578749     ZAF                     Siyathemba Ward 4    10   9743224609
498297 3578729     ZAF                    Thembelihle Ward 1    10   5416472134
498406 3578917     ZAF                    Tsantsabane Ward 6    10  13771086812
498554 3578333     ZAF                         Ubuntu Ward 3    10  16881737829
498860 6582292     ZAF                    Witzenberg Ward 12    10   6555416678

generated with

setwd(".")
myvalues <- read.csv("boundaries.csv", sep=";")
high_admin_level = subset(myvalues, level>=10)
large_and_high_admin_level = subset(high_admin_level, qm>=1000*1000*5000) # 5000+ km^2
print(large_and_high_admin_level)

what from spot checks like https://www.openstreetmap.org/relation/3578333#map=6/-30.798/27.927 or https://www.openstreetmap.org/relation/6386457#map=5/-28.246/142.822 looks more or less correct

matkoniecz commented 5 years ago

One thing that worries me

            id country                    name level          qm
343716 3722319     NGA             Darazo East     8  0.03246091
476827  142409     USA Helena Valley Northeast     8 50.93221744

generated by

setwd(".")
myvalues <- read.csv("boundaries.csv", sep=";")
extremely_small_areas = subset(myvalues, qm<=100) # <100 m^2
print(extremely_small_areas)

has some weird values not matched by OSM data. Either it is result of @wambacher data generated from old version or my processing breaks something.

EDIT: cat boundaries.csv | grep Darazo shows

3722238;"NGA";"Darazo";6;2730585774.46975
3722319;"NGA";"Darazo East";8;0.0324609111412428
3722320;"NGA";"Darazo East";8;138262316.647167
3722321;"NGA";"Darazo west";8;143655063.745397

@wambacher - do you have any idea what is source of entry with id 3722319 ? Is it just OSM data that was fixed after you imported your database?

wambacher commented 5 years ago

500 k admin boundaries with id,country,level,name, size in square meters.

Thank you very much! Is it including all admin boundaries areas or just ones mapped as relations?

relation only.

there are about 10000 Areas (closed linestrings) missing. ~ 2% of total. should not influence the result very much.

funny: https://www.openstreetmap.org/way/287224821 and ~ 1000 more in this area.

wambacher commented 5 years ago

@wambacher - do you have any idea what is source of entry with id 3722319 ? Is it just OSM data that was fixed after you imported your database?

my imports are usually starting midnight german time - every day. the timestamp of the current csv is 2019-02-23 00:59:02+01 (friday to saturday night)

please check https://www.openstreetmap.org/relation/3722319 IT IS SUCH SMALL, really

https://www.openstreetmap.org/relation/142409 is "funny" too. Damaged big outer and a very small second outer. result-> area of the small outer. This is a problem of my data collector. It does not detect invalid polygons like that right now. Not easy for me to fix that :(

wambacher commented 5 years ago

And as a sanity check: largest areas with admin level 10


            id country                                  name level           qm
8244   6386457     AUS Anangu Pitjantjatjara Yankunytjatjara    10 102185599221
8264   6206040     AUS                            Anna Creek    10  24018964033
8321   5737709     AUS                                Arumpo    10   5333492647
8468   5738770     AUS                             Balranald    10   5203889175
8759   3114253     AUS                            Big Desert    10   5490959281
.....
Australia is a BIG country, a very big country. And in this desert AL10 will be big too. 

walter

btw: all processing is done with PostGIS using qm= ST_Area(geography(geom)); definitely exact.

matkoniecz commented 5 years ago

I thought about useful statistics and my current plan is to find median area for each admin level and use this data for potential adjusting of initial zoom levels for display of admin boundaries.

jeisenbe commented 5 years ago

find median area for each admin level and use this data for potential adjusting of initial zoom levels for display of admin boundaries

+1 This is a good way to start. I also think we should also try to look at the middle 95% of features. The largest 1% or 2% can be too large, but the vast majority of admin areas should look ok. Similarly, if only 2% of the smallest areas are unintelligible that is acceptable, but if it is 5% we need to do something.

matkoniecz commented 5 years ago

median area of each admin level

1 NA 
2 122775 
3 28674 
4 10367 
5 3604 
6 749 
7 140
8 15.7
9 4.4 
10 1.05
11 0.67 
12 0.73 
13 NA 
14 NA 
15 0.0018 
16 NA 
17 NA 
18 NA 
19 NA 
20 NA

cleaned output of

setwd(".")
myvalues <- read.csv("boundaries.csv", sep=";")
for(i in 1:20) {
    data = subset(myvalues, level==i)$qm
    print(i)
    print(median(data) / 1000 / 1000) # in km^2
    print("")
}
matkoniecz commented 5 years ago

I also think we should also try to look at the middle 95% of features. The largest 1% or 2% can be too large, but the vast majority of admin areas should look ok. Similarly, if only 2% of the smallest areas are unintelligible that is acceptable, but if it is 5% we need to do something.

I worry that it may be impossible, but it is worth trying to do that.

matkoniecz commented 5 years ago

And again, area size - this time with unneeded detail removed

1 NA 2 122k 3 28k 4 10k 5 3.6k 6 750 7 140 8 15.7 9 4.4 10 1.05 11 0.67

Lets assume that administrative areas of each level on first appearance should have roughly similar area sizes. That is utterly impossible because each admin level has areas of wildly different sizes. So lets change it to "median sized admin level area of each admin level should have a similar area when it is initially displayed"

So if median admin level 2 area would have area of 100 units and median admin level 3 area would have area of 50 units I would expect admin level 3 area to appear one zoom level (after magnification by two times). And for case where median admin level 6 area would have area of 10 units and median admin level 7 area would have area of 2.5 units I would expect admin level 7 area to appear two zooms level after admin 6 (after magnification by four times).

So looking at this data my expectation are as follows

Currently we render admin levels from the zoom levels like this (data collected by kocio-pl, I fixed appearance of zoom level 2):

admin level initial zoom level
1 -
2 0
3 4
4 4
5 11
6 11
7 12
8 12
9 13
10 13
11 -
12 -

Given assumptions made collected data suggests that we could change it to

admin level initial zoom level
1 -
2 0
3 2 (changed from 4)
4 4
5 5 (changed from 11)
6 7 (changed from 11)
7 10 (changed from 12)
8 13 (changed from 12)
9 16 (changed from 13)
10 17 (changed from 13)

I am really dubious about just following it and it likely means that assumptions are bad.

Though impact of admin level 3 should be minor, according to the table at https://wiki.openstreetmap.org/wiki/Tag%3aboundary=administrative most regions are not using it anyway.

Admin level 5 would be a massive change though again most countries are simply not using it.

Displaying admin level 6 from z7: my reaction is NO NO NO, this is a terrible idea.

Displaying admin level 7 from z10: what about no?

Displaying admin level 8 from z13: this is interesting, it is the first one suggesting that we should start rendering earlier, not later.

Displaying admin level 9 from z16: this is a pretty big difference, but I happily ignored even stronger hint for admin level 6.

Admin level 10 from z17: again strong hint, but I just disregarded as strong one.


So what are conclusions? That it turned out that we are not following at all "median sized admin level area of each admin level should have a similar area when it is initially displayed" so this analysis is not as useful as I hoped.

But there are hints that admin level 9 and admin level 10 should be rendered later.

matkoniecz commented 5 years ago

OK, results are not as useless as I thought: we have confirmation that median sized admin level 9 or admin level 10 area is much, much smaller than admin level 8 area.

It means that it is desirable to change rendering and examples presented at start are not just unusual outliers and that admin level 9 is also affected in the same way.

Obviously, it is biased to currently mapped areas and completely ignores changing scale but seems to be good enough to raise minimum zoom level for rendering admin level 9 and admin level 10.

matkoniecz commented 5 years ago

@jeisenbe Are you maybe interested in making PR?

If that would be useful for anyone making PR I can find some representative locations (with different area sizes for admin levels z9 and z10) - to avoid testing just on places with extremely small or extremely large areas.

jeisenbe commented 5 years ago

Thanks! What are the units of the areas? Square kilometers?

Joseph

On Mon, Feb 25, 2019 at 4:23 PM Mateusz Konieczny notifications@github.com wrote:

OK, results are not as useless as I thought: we have confirmation that median sized admin level 9 or admin level 10 area is much, much smaller than admin level 8 areas

matkoniecz commented 5 years ago

Thanks! What are the units of the areas? Square kilometers?

Whoops, I copied in histogram code again. I updated and fixed https://github.com/gravitystorm/openstreetmap-carto/issues/3678#issuecomment-466919919

And yes, results should be in square kilometers - so median-sized country mapped in OSM has area of 122 000 square kilometers etc.

horrible R code again copied below:

setwd(".")
myvalues <- read.csv("boundaries.csv", sep=";")
for(i in 1:20) {
    data = subset(myvalues, level==i)$qm
    print(i)
    print(median(data) / 1000 / 1000) # in km^2
    print("")
}
wambacher commented 5 years ago

horrible R code again copied below:

setwd(".")
myvalues <- read.csv("boundaries.csv", sep=";")
for(i in 1:20) {
  data = subset(myvalues, level==i)$qm
  print(i)
  print(median(data) / 1000 / 1000) # in km^2
  print("")
}

still lovin SQL

select level,to_char(avg(qm/1000000),'9999999999.999') "km²"
  from boundaries
 group by level
 order by level::int;

 level |       km²       
-------+-----------------
 1     | (null)
 2     |      709951.939
 3     |      240591.793
 4     |       48694.136
 5     |       11286.183
 6     |        2782.130
 7     |        1123.832
 8     |         200.712
 9     |         104.929
 10    |          22.015
 11    |           4.964
 12    |           1.323
 15    |            .002
(13 Zeilen)

Zeit: 869,602 ms

data not filtered.

matkoniecz commented 5 years ago

Averages are usually worse statistical tool, but lets see:

a2: 0 a3: 2 log2(709951/240591) = 1.6... a4: 4 log2(709951/48694) = 3.9... a5: 6 log2(709951/11286) = 6.. a6: 8 log2(709951/2782) = 8... a7: 9 log2(709951/1123) = 9.3... a8: 12 log2(709951/200) = 11.8... a9: 13 log2(709951/105) = 12.7... a10: 15 log2(709951/22) = 15...

admin level initial zoom level
1 -
2 0
3 4
4 4
5 11
6 11
7 12
8 12
9 13
10 13
11 -
12 -

Given assumptions made collected data suggests that we could change it to

admin level initial zoom level
1 -
2 0
3 2 (changed from 4)
4 4
5 6 (changed from 11)
6 8 (changed from 11)
7 9 (changed from 12)
8 12 (changed from 12)
9 13
10 15 (changed from 13)

The most helpful information for purposes of this issue is that admin level 10 really should be render later. Admin level 9 apparently here ranks higher and changing it is not so obvious - I guess that there are some very large admin level 9 areas. But I would consider median as the more useful indicator.

imagico commented 5 years ago

Please always make clear if you are dealing with mercator square meters/kilometers here or with real ground units. ST_Area(geom::geography) will not necessarily give you a good approximation of the real ground area in case you have slightly larger polygons extending across more than six degrees longitude.

matkoniecz commented 5 years ago

I guess that there are some very large admin level 9 areas

From quick look larger difference seems to be that admin level 10 has far more really small areas, though admin level 10 has also larger tail of quite sizeable ones.

histogram_admin_level_9

Note that histogram below has one bar cutoff and going out of scale.

histogram_admin_level_10

setwd(".")

myvalues <- read.csv("boundaries.csv", sep=";")

png("histogram_admin_level_9.png", width=1500, height=1500, units="px", res=300)
hist(subset(myvalues, level==9)$qm, 20000, xlim=c(0, 100000000), ylim=c(0, 70000), xlab="admin area size [m^2]", ylab="count of admin areas mapped as relations", main="histogram of admin areas (admin_level=9)")
dev.off()

png("histogram_admin_level_10.png", width=1500, height=1500, units="px", res=300)
hist(subset(myvalues, level==10)$qm, 20000, xlim=c(0, 100000000), ylim=c(0, 70000), xlab="admin area size [m^2]", ylab="count of admin areas mapped as relations", main="histogram of admin areas (admin_level=10)")
dev.off()

Here version without vertical cutoff

histogram_admin_level_9 histogram_admin_level_10

png("histogram_admin_level_9.png", width=1500, height=1500, units="px", res=300)
hist(subset(myvalues, level==9)$qm, 20000, xlim=c(0, 100000000), ylim=c(0, 150000), xlab="admin area size [m^2]", ylab="count of admin areas mapped as relations", main="histogram of admin areas (admin_level=9)")
dev.off()

png("histogram_admin_level_10.png", width=1500, height=1500, units="px", res=300)
hist(subset(myvalues, level==10)$qm, 20000, xlim=c(0, 100000000), ylim=c(0, 150000), xlab="admin area size [m^2]", ylab="count of admin areas mapped as relations", main="histogram of admin areas (admin_level=10)")
dev.off()
matkoniecz commented 5 years ago

Please always make clear if you are dealing with mercator square meters/kilometers here or with real ground units.

I admit that I am not sure. I just took @wambacher data.

I mostly skipped that (except small disclaimer "it is biased to currently mapped areas and completely ignores changing scale") because quality of this analysis is so low that this difference can be mostly ignored.

I mostly wanted to check whatever area present in report is an outlier or not (there was one futile attempt to gather more useful data but it went nowhere).

jeisenbe commented 4 years ago

I've updated the title, since this could be mostly resolved by rendering admin_level=9 at z14 and admin_level=10 at z15 or z16.

jeisenbe commented 4 years ago

Re: log2 used above: "So looking at this data my expectation are as follows: admin level 3 to appear two zoom levels after z2 (as areas are approximately four times larger)..."

A tile at z2 is 4 times as much area as z3, not twice as much, so I think the math is wrong

The median admin_level=10 boundary is 1 square kilometer square according to the comment above. When drawn as a square, this renders on zoom level 15 at the equator as ~200x200 standard pixels (admin10, z15, 0 deg = ~40k way_pixels).

By comparison, the median admin_level=2 (country) is listed at 122,000 square kilometers. At z4 this renders as under 40x40 standard pixels, (admin2, z4, 0deg = ~1600 way_pixels). To get close to 200*200 pixels you have to zoom in to z6, where the average 122k sqkm admin_2 = 150x150 pixels (22k way_pixels), and z7 is 300x300.

So the difference in zoom level from admin_level=2 to admin_level=10 is 15 - 6.5 = 8.5 zoom levels, not 17 zoom levels. Actually, all the numbers are off by 2.

Average (mean) a2: 0 a3: 0.8 a4: 2.0 a5: 3.0 a6: 4.0 a7: 4.6. a8: 5.9 a9: 6.4 a10: 7.5 (= 7.5 zoom levels later than a2)

Median a2: 0 a3: 1.1 a4: 1.8 a5: 2.5 a6: 3.6 a7: 5 a8: 6.5 a9: 7.4 a10: 8.4 (= 8.4 zoom levels later)

jeisenbe commented 4 years ago

So we can't compare admin_level=2 or admin_level=4 directly to lower admin levels. The difference in size does not explain the differenc in initial zoom level.

This makes sense when we consider that border countries are much more significiant: most are marked and guarded. Also admin_level=4 features are often signed and significant for daily life in many countries. In contrast, the lower levels become less and less important as you go from 5 to 10.

So let's compare admin_level=5 to admin_level=8 - these are both mid-level local admin boundaries, so they are more reasonable to compare. (Usually they are types of county, municipality or district depending on the country).

We will look at an "mean" (average, which skews large) size area at 60 degrees, versus a "median" (50th percentile size) size area at the equator. As mentioned, it's not possible to pick an initial zoom level which will work for all regions, but I think it's reasonable to at consider both the average and mean area size from the equator to the edge of populated regions around 60 degrees N / S.

The mean (average) admin_level=5 area is listed as 11,300 km2 above, which renders as 360x360 standard pixels at 60 degrees and z8, or almost 130,000 way_pixels. We render the central text label of states up to 196,000 way_pixels only, so this suggests that admin_level=5 would need to be rendered at z8 or even z7 if we want to consider showing a central text label with the name, as requested in #4004. If only rendered from z9, an mean-sized area would already be >500,000 way_pixels at 60 degrees, almost the whole screen. Certainly z10 or z11 is too late.

The median admin_level=5 area is only 3600 km2, so this renders as 100x100 standard pixels at the equator, or 10,000 way_pixels, at z8. That seems big enough. If rendered at z7 the median area would be only 2500 way_pixels, which seems a little small to be worth showing - too small to show the central text label at any rate.

In comparison, the median admin_level=8 area is 16 km2, which renders as just over 100x100 way_pixels at z12 - so the median admin_level=8 area is 4.0 zoom levels smaller than the median admin_level=5 area, as shown in the table above.

An average (mean) size admin_level=8 area is 200 km2, which renders as 750x750 pixels, or 562,500 way_pixels, at z12. This is quite large - we certainly should not render admin_level=8 any later than z12. There is a huge difference between the median and the mean: a whole zoom level.

Compared to the mean admin_level=5 area, this is 3.0 zoom levels later.

OK, now let's compare to admin_level=10: The mean (average) admin_level=10 are is 22 km2, which renders as 1000x1000 pixels at z14, and 500x500 at z13. So this is about 4 zoom levels smaller than the mean admin_level=5 area, and 1.5 zoom levels below admin_level=8.

The median admin_level=10 is much smaller: 1 km2, which renders at zoom level 14 at the equator as a square of just over 100x100 standard pixels (10k way_pixels).

So based on size alone, admin_level=10 could be rendered first at z14: the median will not be too small at the equator, and the mean will not be too large at 60 degrees, though outliers on either side could be too large or small.

This does not consider the relative importance of admin_level=10 features, which might suggest rendering them later.

In fact we render admin_level=2 borders very early, because they are one of the only things that is worth showing at z1 to z3, not because they are sufficiently large to be a reasonable size on screen on average. In contrast, admin_level=5 to 8 features are moderately important, and admin_level=9 to 11 are less significant in most countries.

jeisenbe commented 4 years ago

Based on the average and median sizes above, we could make 2 different schemes from admin_level=5 to 10:

Average (mean) a2: 0 a3: 0.8 a4: 2.0 a5: 3.0 -> 0 a6: 4.0 -> 1 a7: 4.6 -> 1.6 a8: 5.9 -> 2.9 a9: 6.4 -> 3.4 a10: 7.5 -> 4.5 (= 7.5 zoom levels later than a2, 4.5 later than z5)

Median a2: 0 a3: 1.1 a4: 1.8 a5: 2.5 -> 0 a6: 3.6 -> 1 a7: 5 -> 2.5 a8: 6.5 -> 4 a9: 7.4 -> 5 a10: 8.4 -> 6 (= 8.4 zoom levels later than z2, 6 later than z5)

Starting at z9, based on averages: a5 - z9 a6 - z10 a7 - z10.5 -> z11 a8 - z12 a9 - z12.5 -> z13 z10 - z13.5 -> z14

Starting at z8, based on medians: a5 - z8 a6 - z9 a7 - z10.5 -> z11 a8 - z12 a9 - z13 z10 - z14

So with a little adjusting, we can set z5 at either z8/z9 or z9/z10, and then admin_level=7 should be rendered at z11, one zoom level sooner than currently, admin_level=8 is fine at z12, admin_level=9 is fine at z13, but admin_level=10 should be moved to z14, based on size alone.

I think the relatively low importance of admin_level=9 and =10 also could justify moving them one admin_level later than the others, to z14 and z15. This will certainly help in Indonesia and other tropical countries with small areas.

Similarly, the higher importance of z5 justifies rendering it at z8 and admin_level=6 at z10 (2 zoom levels later), even though the size is only 4 times bigger, because in countries which have this admin_level it is rather important, while admin_level=6 varies widely in importance: it’s useful in the UK and most States in the USA, but it’s been said these areas are not significant in Poland and some other European countries.

This leads to a suggested pattern: a2 - z1 a3 - z4 (or could be 1 earlier?) a4 - z4 (Ideally should start at z5, except in large countries where it could start at z13, if we had a high-performance way to select this) a5 - z8 a6 - z10 a7 - z11 a8 - z12 a9 - z13 or 14 z10 - z14 or 15

I've already made a branch to test this, along with other cartographic changes for the borders.

jeisenbe commented 4 years ago

This problem was partially addressed by #4100.

It may not be possible to solve it entirely, though in my country it could be improved by using z14 for admin_level=9 and z15 for admin_level=10 - after the next release we should see how it looks globally.