nspcc-dev / locode-db

Source of UN/LOCODE database generated by NeoFS CLI.
MIT License
3 stars 6 forks source link

Reuse public continents DB #19

Closed roman-khimov closed 1 week ago

roman-khimov commented 11 months ago

We've got continents.geojson.gz of not-completely-known origin. Better have some known source of data and not store it in the repository.

For example, there is https://github.com/rapomon/geojson-places/tree/master/data/continents

End-rey commented 3 weeks ago

For example, there is https://github.com/rapomon/geojson-places/tree/master/data/continents

In this repository, a continents file is associated with a set of countries in the file. If you use it, you will first need to change the logic of the program, and secondly, it is not a fact that all countries are represented there. It is more reliable to determine the continent by points on the map as it is done now. Another way to use a separate files of continents, but it will be necessary to somehow compress them into one. I suggest using this file: https://gist.github.com/hrbrmstr/91ea5cc9474286c72838

cthulhu-rider commented 3 weeks ago

there was no nice src of continental polygons (in GeoJSON particularly) few yrs ago. Maybe smth appeared since then, idk still

if nothing exists, and since continental DB is only needed at the generative stage, we can try to use a public service like OSM. This requires network connection of generator and brings lts

AliceInHunterland commented 3 weeks ago

we should look at https://www.naturalearthdata.com/ as well

End-rey commented 3 weeks ago

In this repo https://github.com/gbif/continents boundaries are drawn carefully across land, but roughly across the sea. I think this format is suitable for our task.

Also, I found this https://hub.arcgis.com/maps/CESJ::world-continents, but it seems like there is no license.

AliceInHunterland commented 3 weeks ago

can we regenerate continents.geojson.gz by ourselves using https://www.naturalearthdata.com/ (even google earth uses it)?

 ogr2ogr -f GeoJSON continents_dissolved.geojson ne_10m_geography_regions_polys.shp -dialect sqlite -sql "SELECT ST_Union(geometry), REGION FROM ne_10m_geography_regions_polys GROUP BY REGION"

(with python dataframe also can be done). it will still be 8 polygons as we had. probably can be improved. (and they have https://github.com/nvkelso/natural-earth-vector/blob/master/geojson/ne_10m_geography_regions_polys.geojson if you don't want to store it)

Снимок экрана 2024-08-19 в 17 05 03

continents based on the country also can be an option:

Снимок экрана 2024-08-19 в 17 06 17

another option:

Снимок экрана 2024-08-19 в 17 11 07
ogr2ogr -f GeoJSON continents_base.json ne_10m_land.shp
ogr2ogr -f GeoJSON continents_regions.json ne_10m_geography_regions_polys.shp -where "FEATURECLA='Continent'"
ogr2ogr -f GeoJSON -update -append merged_continents.json continents_regions.json

and this can be improved by joining the second and third maps I think.

End-rey commented 2 weeks ago
roman-khimov commented 2 weeks ago

DJDCT is certainly Africa, not Asia, so it's 2. And Enez is certainly Europe, so it's 1. https://gist.github.com/hrbrmstr/91ea5cc9474286c72838 doesn't seem to be very reliable then.

https://github.com/gbif/continents --- more details needed, maybe it fixes things, maybe it breaks something. https://hub.arcgis.com/maps/CESJ::world-continents --- perfect fit, how hard to integrate? https://www.naturalearthdata.com --- also needs to be reviewed, maybe it's for good, maybe not.

End-rey commented 2 weeks ago

During the check, I found many errors in the location data. For example:

If I choose old geojson with continents and mark all points that are not on the earth (distance to earth > 0.2 in the units of geometry), I will have 840 points and this map: image

So idea get continents from here https://github.com/rapomon/geojson-places/tree/master/data/continents is not so bad. I checked and made sure that all countries are represented in this json.

End-rey commented 2 weeks ago

I have researched the interaction with this file https://github.com/rapomon/geojson-places/blob/master/data/continents/continents.json:

roman-khimov commented 2 weeks ago

Russia now all in Asia

Unacceptable.

End-rey commented 2 weeks ago

Russia now all in Asia

Unacceptable.

Are there any other countries that are important to be in several continents? I think, we can combine approaches.

roman-khimov commented 2 weeks ago

Turkey, obviously. Egypt. Panama. Others are more consistent, but Canary Islands (or Ceuta) are not exactly Europe geographically, for example. In general strict "country -> continent" mapping is flawed.

End-rey commented 2 weeks ago

https://github.com/gbif/continents 313 difference:

roman-khimov commented 2 weeks ago

What a bloody mess... Some changes are good, some are not.

End-rey commented 2 weeks ago

https://www.naturalearthdata.com difference 313 lines:

End-rey commented 2 weeks ago

For update 2024-1

In general, several records fell into place by correcting the original coordinates. But otherwise the changes are the same as before.

roman-khimov commented 1 week ago

Let's take https://gist.github.com/hrbrmstr/91ea5cc9474286c72838, but copy it to our repository with link to the original file. I fear it can disappear, doesn't look like a reliable source. But MIT allows for us to copy and at least we'd know where it comes from. It's also much smaller than the one we have now.