wrf-model / WPS

The official repository for the WRF Preprocessing System (WPS)
203 stars 162 forks source link

Possible bug when ingesting static datasets in lambert projections ? #140

Open Enzoupi opened 4 years ago

Enzoupi commented 4 years ago

Hi everybody,

This issue is entitled as a question as I don't really believe it could be a bug. Indeed, if it was a bug, I imagine many people would have detected it before ! But, here comes my little story :

I have been struggling with WPS lately when trying to ingest two datasets. One in Lambert 93 (EPSG:2154) and one in EPSG:3035 (i.e. Lambert Azimuthal Equal Area used for european datasets). I observed a shift with respect to the topography in both cases.

However, for the European dataset (Corinne Land Cover, FYI), everything worked perfectly when I used gdal to project it to EPSG:4326 and then use WPS ! :)

So I have tried to design a toy example : The idea is to make a dummy categorical field with two lines crossing at a specific location, both in EPSG:4326 and EPSG:2154. Then, compare that value to the point in geo_em* files were it does cross to detect if a significant shifting occurs or not.

For convenience, the dummy static fields are generated with a few python lines as one can see below:

First, the static data for EPSG:4326

import numpy as np

 # Define deltalon and deltalats
 dx = 1e-3
 dy = 1e-3

 # Target non zero value
 tgtlon = 5.700
 tgtlat = 45.180

 # Number of points
 nx = 30000; nxtgt = 15000
 ny = 30000; nytgt = 15000

 # Make an empty field
 #field = np.reshape(np.random.randint(0,3,nx*ny,dtype=">u1"),(nx,ny))
 field = np.zeros((nx,ny), dtype="uint8")

 # Make the cross of non-zero values:
 field[nxtgt-2:nxtgt+2, nytgt-2:nytgt+2] = 4
 field[nxtgt-1:nxtgt+1, nytgt-1:nytgt+1] = 5
 field[nxtgt, :] = 9
 field[:, nytgt] = 11
 field[nxtgt, nytgt] = 15

 # Compute stratlon,startlat
 startlon = tgtlon - nxtgt * dx
 startlat = tgtlat - nytgt * dy
 print(f"Startlon : {startlon} | startlat = {startlat}")

 # write index file
 index_content=f"""type=categorical
 row_order=top_bottom
 category_min=0
 category_max=33
 projection=regular_ll
 dx={dx}
 dy={dy}
 known_x=1.0
 known_y=1.0
 known_lat={startlat}
 known_lon={startlon}
 wordsize=1
 missing_value=2
 tile_x={ny}
 tile_y={nx}
 tile_z=1
 units="category"
 description="Dummy"
 """
 textfile = open('index', 'w')
 textfile.write(index_content)
 textfile.close()

  # write field
 filename = f"{1:05d}-{ny:05d}.{1:05d}-{nx:05d}"
 print(f"write field to {filename}")
 field.tofile(filename)

Then, the static data for EPSG:2154

 import numpy as np
 from pyproj import Proj, transform

 ## define parameters for the lambert93 field

 # Define deltalon and deltalats
 dx = 100
 dy = 100

 # Target non zero value
 tgtlon = 912000
 tgtlat = 6457000
 inProj = Proj(init='epsg:2154') 
outProj = Proj(init='epsg:4326')
 target_lon, target_lat = transform(inProj,outProj,tgtlon,tgtlat)
 print (f"Target lon is {target_lon}, Target Lat is {target_lat}")

# Number of points
 nx = 30000; nxtgt = 15000
 ny = 30000; nytgt = 15000

 # Make an empty field
 field = np.zeros((nx,ny), dtype=">u1")

 # Make the cross of non-zero values:
 field[nxtgt-2:nxtgt+2, nytgt-2:nytgt+2] = 1
 field[nxtgt-1:nxtgt+1, nytgt-1:nytgt+1] = 2
 field[nxtgt, :] = 9
 field[:, nytgt] = 11
 field[nxtgt, nytgt] = 15

 # Compute stratlon,startlat
 startlon = tgtlon - nxtgt * dx
 startlat = tgtlat - nytgt * dy
 inProj = Proj(init='epsg:2154')
 outProj = Proj(init='epsg:4326')
 startlon,startlat = transform(inProj,outProj,startlon,startlat)
 print(f"Startlon : {startlon} | startlat = {startlat}")

 # write index file
 index_content=f"""type=categorical
 row_order=top_bottom
 category_min=0
 category_max=15
 projection=lambert
 dx={dx}
 dy={dy}
 known_x=1.0
 known_y=1.0
 known_lat={startlat}
 known_lon={startlon}
 stdlon=3.0
 truelat1=44.0
 truelat2=49.0
 wordsize=1
 missing_value=2
 tile_x={ny}
 tile_y={nx}
 tile_z=1
 units="category"
 description="Dummy"
 """
 textfile = open('index', 'w')
 textfile.write(index_content)
 textfile.close()

 # write field
 filename = f"{1:05d}-{ny:05d}.{1:05d}-{nx:05d}"
 print(f"write field to {filename}")
 field.tofile(filename)

Using these static datasets together with their index files, the two lines crosses at the following longitude and latitude :

EPSG:4326 : Target LAT_M = 45.1788 LONG_M = 5.69937 EPSG:2154 : Target LAT_M = 45.1514 LONG_M = 5.7396

When the perfect value would have been : tgtlat = 45.180 | tgtlon = 5.700. Clearly, when using EPSG:4326 it works and when using EPSG:2154 there is a shift. This shift can be seen visually from the two figures below.

EPSG:4326 EPSG4326

EPSG:2154 EPSG2154

It may be -- read it is very likely -- that I am doing something obviously stupid when providing lambert data to WPS. In that case, I would be glad to know what because I have been trying to understand the origin of this shift for quite a while :)

But, as it occured to me with two distinct datasets (plus my toy-example) I raised that issue !

Thank you in advance for any help that one could provide !

All the best,

Enzo

mgduda commented 4 years ago

Is this related to the topography issue in #143 ?

Enzoupi commented 4 years ago

Hi @mgduda !

I do not believe it is related to #143. Because here I am not speaking of the same "intermediate" files. In these guys, the projection is defined by a keyword, which is pretty explicit and I doubt that my misunderstanding comes from here. Indeed, as shown in the index files produced by the python codes, projections codes are lambert for 2154 and regular_ll for 4326 here.

However it could be related in some sense since it is probably me misunderstanding something in both cases !