schmitt-muc / SEN12MS

Repository for SEN12MS related codes and utilities
Other
98 stars 19 forks source link

region locations in Sen12MSOverview.ipynb #5

Open CrohnEngineer opened 2 years ago

CrohnEngineer commented 2 years ago

Hi everybody,

First of all I would like to thank you for all the work you have done with this really detailed dataset.
I've been digging it for some weeks now and I really appreciated all the tools available in the toolbox.
Regarding this, I was using the Sen12MSOverview.ipynb notebook to perform some data analysis.
I've noticed that the region location plot in cell 8 is different when running the notebook locally.
In particular, if I leave the code unchanged as in the following block

gdf = gpd.GeoDataFrame(regions, geometry=gpd.points_from_xy(regions.y, regions.x), crs=4326).to_crs(epsg=3857)

fig, ax = plt.subplots(figsize=(15,16))
gdf.plot(ax=ax,column="season",legend=True)
#ax.scatter(gdf.geometry.x,gdf.geometry.y, c=gdf.season)
ctx.add_basemap(ax, url=ctx.providers.Stamen.TerrainBackground)

for x,y,region in sorted(zip(gdf.geometry.x, gdf.geometry.y, gdf.region)):
    ax.annotate(region,(x,y))

ax.set_xlim(-20026376.39,20026376.39)
ax.set_ylim(-1e7, 1.5e7)
ax.axis('off')

the plot does not represent the whole world map, but focuses instead on central meridians (picture below): image

I hypothized there was a typo in the GeoPandas DataFrame instantiation: calling the method gpd.points_from_xy, regions.y and regions.x seem swapped (but I'm not sure about it).
So, I changed that line to gdf = gpd.GeoDataFrame(regions, geometry=gpd.points_from_xy(regions.x, regions.y), crs=4326).to_crs(epsg=3857), and now the plot reports the whole world map as pictured below: image

I can see however that some regions are not placed correctly with respect to the original map in the GitHub notebook (just look for instance at region 21, which is now placed in Canada, and 41, which now is no more in that country).

Sorry if this question might sound silly, but since I'm a newbie working with raster data do you have any guess on what is happening?
I didn't change or modify your code in any way, and I created my environment starting from the classification requirements file and simply adding rasterio, geopandas, contextily, pyproj and matplotlib.

Thank you in advance! Bests,

Edoardo

MarcCoru commented 2 years ago

Hi,

thanks for your investigation and feedback!

My first guess would be that the behavior of contextily changed since I wrote the notebook. For instance, in newer versions, it should not be necessary anymore to provide the URL explicitly in ctx.add_basemap(ax, url=ctx.providers.Stamen.TerrainBackground)

Also, sometimes the alignment of map tiles with geodata is off. That may explain the different placement of the regions.

Can you try to simplify the add_basemap call to something like cx.add_basemap(ax, crs=4326) following the latest contextily documentation

Also, the shapefile regions.shp generated in the last cell gdf.to_file("regions.shp") should be unaffected from any visualization glitches. You could load it into a gis software, such as QGIS, and visualize it there.

Hope that helps!

I'll try to reproduce this issue later this week.

Cheers, Marc

CrohnEngineer commented 2 years ago

Hi @MarcCoru ,

Thanks for the reply!

My first guess would be that the behavior of contextily changed since I wrote the notebook.

I see! May I ask what is your python environment? If it might help, this is mine:

contextily==1.2.0
geopandas==0.10.2
geopy==2.2.0
pyproj==3.3.0
rasterio==1.1.5

For instance, in newer versions, it should not be necessary anymore to provide the URL explicitly in ctx.add_basemap(ax, url=ctx.providers.Stamen.TerrainBackground)

Can you try to simplify the add_basemap call to something like cx.add_basemap(ax, crs=4326) following the latest contextily documentation?

I tried your suggestion, just changing the crs paramter to 3857 since the code was giving me errors (correct me if I'm wrong, but we reprojected the coordinates to EPSG:3857 when creating the GeoPandasDataFrame otherwise we couldn't see the regions on a map, right?).
If I leave the code unchanged, the world map is still "restricted" as below image

If instead I swap the x, y coordinates when creating the GeoPandasDataFrame like this gdf = gpd.GeoDataFrame(regions, geometry=gpd.points_from_xy(regions.x, regions.y), crs=4326).to_crs(epsg=3857), I get a "whole" world map, but with regions placed differently from the notebook in the repository (see picture below). image

Also, sometimes the alignment of map tiles with geodata is off. That may explain the different placement of the regions.

Don't know if this can help, but I checked the locations of the tiles using geopy, and at a first glance they seem to be correctly placed in the second map.
These are for instance all the regions in Canada: image

Do you have any guess why swapping the x, y coordinates like the above produces a correct map?
Perhaps a mismatch between library versions as you were suggesting?
Again, thanks for your help and time!