nsidc / polarstereo-lonlat-convert-py

Python functions for converting polar stereographic coordinates.
MIT License
19 stars 4 forks source link

Converting xy grids to lat-lon #3

Open gkb999 opened 1 year ago

gkb999 commented 1 year ago

Hi there, I have been working with NSIDC datasets for ages now, but came across this a while ago. Good one firstly.

Second things, does the cocnversion work for AMSR2 data too? As they too follwo the similar NSIDC structure.

My aim: Add additional dimensions as lat-lon along with x and y for the dataset (nc file). Waht I did:

from polar_convert import polar_lonlat_to_xy
from polar_convert import polar_xy_to_lonlat
from polar_convert.constants import SOUTH
x = ds.x  # x coordinate in km
y = ds.y  # y coordinate in km
true_scale_lat = -70
re = 6378.137  # earth radius in km
e = 0.01745 # earth eccentricity
hemisphere = SOUTH
ll = polar_xy_to_lonlat(x, y, true_scale_lat, re, e, hemisphere)

What I got:

<xarray.DataArray 'x' (x: 168, y: 117)>
 array([[195.43126504, 195.47999585, 195.52902782, ..., 203.89194508,
         204.00500377, 204.11907332],
        [195.2551187 , 195.30334746, 195.35187478, ..., 203.63734987,
         203.74949449, 203.8626458 ],
        [195.07867649, 195.12640052, 195.17442048, ..., 203.38176054,
         203.49297857, 203.6051989 ],
        ...,
        [164.92132351, 164.87359948, 164.82557952, ..., 156.61823946,
         156.50702143, 156.3948011 ],
        [164.7448813 , 164.69665254, 164.64812522, ..., 156.36265013,
         156.25050551, 156.1373542 ],
        [164.56873496, 164.52000415, 164.47097218, ..., 156.10805492,
         155.99499623, 155.88092668]])
 Coordinates:
   * x        (x) float64 -5.219e+05 -5.156e+05 ... 5.156e+05 5.219e+05
   * y        (y) float64 -1.891e+06 -1.884e+06 ... -1.172e+06 -1.166e+06,
 <xarray.DataArray (x: 168, y: 117)>
 array([[89.97752412, 89.97745488, 89.97738522, ..., 89.96578868,
         89.96563635, 89.96548279],
        [89.97750516, 89.97743574, 89.9773659 , ..., 89.96572168,
         89.96556844, 89.96541397],
        [89.97748637, 89.97741678, 89.97734677, ..., 89.96565509,
         89.96550096, 89.96534558],
        ...,
        [89.97748637, 89.97741678, 89.97734677, ..., 89.96565509,
         89.96550096, 89.96534558],
        [89.97750516, 89.97743574, 89.9773659 , ..., 89.96572168,
         89.96556844, 89.96541397],
        [89.97752412, 89.97745488, 89.97738522, ..., 89.96578868,
         89.96563635, 89.96548279]])
 Coordinates:
   * x        (x) float64 -5.219e+05 -5.156e+05 ... 5.156e+05 5.219e+05
   * y        (y) float64 -1.891e+06 -1.884e+06 ... -1.172e+06 -1.166e+06

So, looks all fine, but 1) the latitudes aren't negative why? 2) i'm trying to assign the obtained arrays to my original dataset.

What I tried:

da = da.assign_coords({'lon': lon})
da = da.assign_coords({'lat': lat})
print(da)

Looks fine:

<xarray.DataArray 'z' (time: 3287, y: 117, x: 168)>
[64609272 values with dtype=float32]
Coordinates:
  * x        (x) float64 -5.219e+05 -5.156e+05 ... 5.156e+05 5.219e+05
  * y        (y) float64 -1.891e+06 -1.884e+06 ... -1.172e+06 -1.166e+06
  * time     (time) datetime64[ns] 2013-01-01 2013-01-02 ... 2021-12-31
    lon      (x, y) float64 195.4 195.5 195.5 195.6 ... 156.2 156.1 156.0 155.9
    lat      (x, y) float64 89.98 89.98 89.98 89.98 ... 89.97 89.97 89.97 89.97
Attributes:
    long_name:     z
    actual_range:  [  0 100]
    grid_mapping:  polar_stereographic

My dataset now looks like:

image

But, when I tried

ds_new.sel(lat = (82, 83))

I get the follwoing error:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_16596/1046814568.py in <module>
----> 1 ds_new.sel(lat = (82, 83))

~\AppData\Roaming\Python\Python39\site-packages\xarray\core\dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2578         """
   2579         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2580         query_results = map_index_queries(
   2581             self, indexers=indexers, method=method, tolerance=tolerance
   2582         )

~\AppData\Roaming\Python\Python39\site-packages\xarray\core\indexing.py in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    179 
    180     indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "map_index_queries")
--> 181     grouped_indexers = group_indexers_by_index(obj, indexers, options)
    182 
    183     results = []

~\AppData\Roaming\Python\Python39\site-packages\xarray\core\indexing.py in group_indexers_by_index(obj, indexers, options)
    141             grouped_indexers[index_id][key] = label
    142         elif key in obj.coords:
--> 143             raise KeyError(f"no index found for coordinate {key!r}")
    144         elif key not in obj.dims:
    145             raise KeyError(f"{key!r} is not a valid dimension or coordinate")

KeyError: "no index found for coordinate 'lat'"

So I'm expecting 2 things to be sorted. 1) Why the latitude values aren't negative (is the function just for Northern Hem? ) 2) Why am I not able to modify the data using these lat-lon?

andypbarrett commented 1 year ago

Hi

1. Why the latitude values aren't negative (is the function just for Northern Hem? )

The function converts projected coordinates (x, y) for both Northern and Southern Polar Stereographic projections. I tested the Southern Hemisphere case for the projected grid origin (The South Pole) for the S. Hem. (x=0., y=0.), which is (lon=0., lat=-90.)

>>> from polar_convert.constants import NORTH, SOUTH
>>> from polar_convert import polar_xy_to_lonlat
>>> x, y = 0, 0
>>> true_scale_lat = 70
>>> re = 6378.137
>>> e = 0.01671
>>> hemisphere = NORTH
>>> polar_xy_to_lonlat(x, y, true_scale_lat, re, e, hemisphere)
[0.0, 90.0]
>>> hemisphere = SOUTH
>>> true_scale_lat = -70  # For the S. Hem, the latitude of true scale is -70 N
>>> polar_xy_to_lonlat(x, y, true_scale_lat, re, e, hemisphere)
[0.0, -90.0]

However, polar_xy_to_lonlat expects projected coordinates in kilometers. The x,y coordinates in the dataset you are using appear to be in meters. I cannot see the units attributes for the x and y coordinates, so you should check this. Even with hemisphere=SOUTH, the first elements of the x and y coordinates, without conversion to km, gives a location close to the North Pole.

>>> x = -5.219e+05
>>> y = -1.891e+06
>>> polar_xy_to_lonlat(x, y, true_scale_lat, re, e, SOUTH)
[195.42905457549008, 89.27762998161299]

One way to check if this is what is happening is to find the minimum latitude in the lat grid you have calculated.

ds.lat.min()

Another check is the plot the latitude grid, either over a map or just as an image to check both the range of latitude values, and the spatial extent of the points.

Once x and y are converted to km, you get the result you expect for a S. Hem grid.

>>> y = -1.891e+06 * 1e-3
>>> x = -5.219e+05 * 1e-3
>>> polar_xy_to_lonlat(x, y, true_scale_lat, re, e, hemisphere)
[195.42905457549008, -71.98213283556814]

However, I do note that -71.9 degrees is a much higher latitude that our standard polar stereographic grids. See here: https://nsidc.org/data/user-resources/help-center/guide-nsidcs-polar-stereographic-projection

I suggest confirming that the grid is a polar stereographic grid. Many of our AMSR2 products are EASE2 (https://nsidc.org/data/user-resources/help-center/guide-ease-grids).

2. Why am I not able to modify the data using these lat-lon?

The xarray.DataArray.sel() method requires that indexers are vectors. lat is an array (2D). Even though you have assigned lat and lon as coordinate variables, only x, y and time can be used for indexing because they are the only coordinates that are (1D) vectors.

Which approach you use to subset the data, depends on what you want to do with the subsetted data.

If you want to extract a rectangular region of data, you can determine the lat, lon coordinates of the corner points of that region, and then use polar_lonlat_to_xy to get a bounding box in projected coordinates . When I do this, I usually use a combination of Google Maps or Google Earth, plotting the corner points over the data using matplotlib and cartopy, and trial and error. You could also use something like shapely or geopandas to define a polygon, convert that polygon to xy coordinates, and then use the bounds methods to get the best-fit bounding-box for that polygon.

This will yield a bounding-box, similar to [minx, miny, maxx, maxy]. These bounds can be used to subset the data using .sel(). For example:

ds_subset = ds.Z.sel(x=slice(minx, maxx), y=slice(miny, maxy))

However, if you want to get a latitude band to calculate a zonal mean, for example, you can mask the latitude band you want and calculate the mean.

ds_masked = ds.Z.where((ds.lat >= 82) & (ls.lat < 83))
zonal_mean = ds_masked.mean()

Alternatively, you can reproject and resample the polar stereographic grid to a latitude-longitude grid. polar_ij_to_lonlat can be used to reproject the grid points to latitude-longitude. These would then need to be interpolated to a regular grid. You could also use rioxarray to do this.

Note, that if you use rioxarray or some tool other than the ones here, and if the data are in either Polar Stereographic or EASE-Grid projection that use the Hugh 1980 or International 1924 Authalic Sphere as their datums, you will need to specify the CRS as a proj4 string or WKT, not as an EPSG code. This is because these two datums have been depreciated by EPSG and reprojection software such as gdal silently use projection definitions that have WGS84 as the datum. This yields incorrect results. We are working with EPSG to solve this issue.

gkb999 commented 1 year ago

Hi, first of all, Many thanks for the really detailed explanation. I'll answer each points mentioned and what I have done one-by-one. I think the mistake I did previously was setting true_scale_lat to -70

true_scale_lat = 70 After trying what you suggested: now the latitudes are negative as expected:

[<xarray.DataArray 'x' (x: 168, y: 117)>
array([[195.43126504, 195.47999585, 195.52902782, ..., 203.89194508,
204.00500377, 204.11907332],
[195.2551187 , 195.30334746, 195.35187478, ..., 203.63734987,
203.74949449, 203.8626458 ],
[195.07867649, 195.12640052, 195.17442048, ..., 203.38176054,
203.49297857, 203.6051989 ],
...,
[164.92132351, 164.87359948, 164.82557952, ..., 156.61823946,
156.50702143, 156.3948011 ],
[164.7448813 , 164.69665254, 164.64812522, ..., 156.36265013,
156.25050551, 156.1373542 ],
[164.56873496, 164.52000415, 164.47097218, ..., 156.10805492,
155.99499623, 155.88092668]])
Coordinates:
* x        (x) float64 -5.219e+05 -5.156e+05 ... 5.156e+05 5.219e+05
* y        (y) float64 -1.891e+06 -1.884e+06 ... -1.172e+06 -1.166e+06,
<xarray.DataArray (x: 168, y: 117)>
array([[-71.98545874, -72.0398866 , -72.09430976, ..., -78.10943595,
-78.16177103, -78.21406488],
[-72.00039991, -72.05487609, -72.10934784, ..., -78.13251206,
-78.18495229, -78.23735213],
[-72.01517405, -72.06969809, -72.12421799, ..., -78.15535557,
-78.20790028, -78.26040545],
...,
[-72.01517405, -72.06969809, -72.12421799, ..., -78.15535557,
-78.20790028, -78.26040545],
[-72.00039991, -72.05487609, -72.10934784, ..., -78.13251206,
-78.18495229, -78.23735213],
[-71.98545874, -72.0398866 , -72.09430976, ..., -78.10943595,
-78.16177103, -78.21406488]])
Coordinates:
* x        (x) float64 -5.219e+05 -5.156e+05 ... 5.156e+05 5.219e+05
* y        (y) float64 -1.891e+06 -1.884e+06 ... -1.172e+06 -1.166e+06]

Thanks a lot for reminding me the meters to km conversions.

However, polar_xy_to_lonlat expects projected coordinates in kilometers. The x,y coordinates in the dataset you are using appear to be in meters.

I tried converting using:

x = (ds.x)*0.001  # x cordinate in km 
y = (ds.y)*0.001 

ds is my dataset (x and y are lon & lat grid resp)

One way to check if this is what is happening is to find the minimum latitude in the lat grid you have calculated.

ds.y.min() shows: array(-1890625.) value

Once x and y are converted to km, you get the result you expect for a S. Hem grid.

II wanted the dataset's cordinates to be in lat-lons.

2. Why am I not able to modify the data using these lat-lon?

The xarray.DataArray.sel() method requires that indexers are vectors. lat is an array (2D). Even though you have assigned lat and lon as coordinate variables, only x, y and time can be used for indexing because they are the only coordinates that are (1D) vectors.

Thanks for letting me know about these indexing. So, even after conversion I can't use these lat-lon values for selections? How about converting these 2D arrays to vectors? Will that work?

Which approach you use to subset the data, depends on what you want to do with the subsetted data. I used rioxarray method for subsetting (clipping by ROI). The data is for entire Antarctica, while I want to work on smaller area. Plus, the gridded data is quite huge on my system and hence, subset is done. I'm anyway successful with Subsetting but having coordinates in lat-lon will help me in later statistical analysis

I did subsetting both using "geopands" and 'rioxarray', just having those meters to degrees is my ultimate aim/obj for the moment. I did use rioxarray but get distorted results Steps: 1) My raw file (in meters) 2) ROI clip using geopandas 3) Reprojected to 4326:EPSG using

ds_latlon = ds_clp.rio.reproject("EPSG:4326") #reprojected

But, the final clipped file is all distorted in resolution image

The resolution is all lost. Plus, if I clip and then re-project, the data has longitudes from -180 to 180 on the clipped file. image while this data originally lies between longitudes: 160-200E (0-360 format).

I need to preserve the RESOLUTION + want those x-y in degrees in right places.

Note, that if you use rioxarray or some tool other than the ones here, and if the data are in either Polar Stereographic or EASE-Grid projection that use the Hugh 1980 or International 1924 Authalic Sphere as their datums, you will need to specify the CRS as a proj4 string or WKT, not as an EPSG code. This is because these two datums have been depreciated by EPSG and reprojection software such as gdal silently use projection definitions that have WGS84 as the datum. This yields incorrect results. We are working with EPSG to solve this issue.

Can you please clarify more about this "specify the CRS as a proj4 string" - as I used EPSG code for reprojection (EPSG:4326). Using rioxarray how can I specify CRS for reprojection to preserve the resolution as well as to convert to degrees?

Thanks A lot for answering.

My data:

<xarray.Dataset>
Dimensions:              (x: 168, y: 117, time: 3287)
Coordinates:
  * x                    (x) float64 -5.219e+05 -5.156e+05 ... 5.219e+05
  * y                    (y) float64 -1.891e+06 -1.884e+06 ... -1.166e+06
  * time                 (time) datetime64[ns] 2013-01-01 ... 2021-12-31
Data variables:
    polar_stereographic  int32 ...
    z                    (time, y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:54:59 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01

Attributes for Polar_sterographic: image

andypbarrett commented 1 year ago

Can you send me a link to the data set you are using.

gkb999 commented 1 year ago

Can you send me a link to the data set you are using.

SURE...!!! https://data.seaice.uni-bremen.de/amsr2/asi_daygrid_swath/s6250/netcdf/

andypbarrett commented 1 year ago

Not sure what happened with your reprojection. I reprojected the full grid and got the results I expected.

The grid dimensions of the dataset you show looks very small compared to the original data. Subsetting using xarray does not retain attributes, that may be the problem. You can save the attributes and then write them to the subsetted dataset.

It also looks like the projection in your dataset is the WGS84 version of the NSIDC Polar Stereographic projection. I don't think this is the correct projection for this dataset. The data I have downloaded are on the Hugh's 1980 ellipsoid version of the projection. I'll answer your question about the CRS for the Polar Stereographic in a separate comment.

The workflow I show below avoids that step by using rioxarray from the start to keep the CRS and other information required for reprojection with the dataset.

Python 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rioxarray
>>> import xarray as xr
>>> from pathlib import Path
>>> print(rioxarray.__version__)
0.14.1
>>> print(xr.__version__)
2023.5.0

I downloaded an AMSR2 sea ice concentration grid from Pangea

>>> DATAPATH = Path("/media/apbarret/andypbarrett_work/Data/AMSR_Sea_Ice_Conc/asi-AMSR2-s6250-20131104-v5.4.nc")
>>> ds = xr.open_dataset(DATAPATH, decode_coords="all")  # decode_coords="all" sets CRS and geotransform
>>> ds
<xarray.Dataset>
Dimensions:              (x: 1264, y: 1328)
Coordinates:
    polar_stereographic  |S1 ...
  * x                    (x) float64 -3.947e+06 -3.941e+06 ... 3.947e+06
  * y                    (y) float64 -3.947e+06 -3.941e+06 ... 4.347e+06
Data variables:
    z                    (y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01

Using decode_coords="all" sets the Coordinate Reference System (CRS) and the geotransform. This makes the xarray object "aware" of the projection and grid resolution.

>>> ds.rio.crs
CRS.from_wkt('PROJCS["NSIDC Sea Ice Polar Stereographic South",GEOGCS["Unspecified datum based upon the Hughes 1980 ellipsoid",DATUM["Not_specified_based_on_Hughes_1980_ellipsoid",SPHEROID["Hughes 1980",6378273,298.279411123064,AUTHORITY["EPSG","7058"]],AUTHORITY["EPSG","6054"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4054"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["X",EAST],AXIS["Y",NORTH],AUTHORITY["EPSG","3412"]]')

>>> ds.rio.resolution()
(6250.0, -6250.0)

Then the grid can be reprojected to a lat-lon grid on the WGS84 ellipsoid.

>>> ds_ll = ds.rio.reproject("EPSG:4326")
>>> ds_ll
<xarray.Dataset>
Dimensions:              (x: 3724, y: 525)
Coordinates:
  * x                    (x) float64 -180.0 -179.9 -179.8 ... 179.8 179.9 180.0
  * y                    (y) float64 -39.31 -39.41 -39.51 ... -89.78 -89.88
    polar_stereographic  int64 0
Data variables:
    z                    (y, x) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01

The projection and resolution can be checked in the same way as the original grid.

>>> ds_ll.rio.crs.to_wkt()
'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'
>>> ds_ll.rio.resolution()
(0.0966702470461869, -0.09649544008191507)

The units of the lat-lon resolution is in degrees. At the equator, that resolution in meters is

>>> import numpy as np
>>> ds_ll.rio.resolution()[0]*np.pi*6378137/360. 
5380.641338020741

Which is very close to the 6250 m resolution of the original grid.

andypbarrett commented 1 year ago

"Can you please clarify more about this "specify the CRS as a proj4 string" - as I used EPSG code for reprojection (EPSG:4326). Using rioxarray how can I specify CRS for reprojection to preserve the resolution as well as to convert to degrees?"

The important step is to set the CRS for the original grid. From the website you sent the correct CRS is the NSIDC South Polar Stereographic on Hugh's 1980 ellipsoid. There is also a WGS84 version of the NSIDC Polar Stereographic, which is different.

See: https://nsidc.org/data/user-resources/help-center/guide-nsidcs-polar-stereographic-projection#anchor-0

It is OK to use the EPSG:4326 code for the output WGS84 latitude-longitude grid. It is only when EPSG:3412 is used that you need to be careful.

Using rioxarray and xarray to read the data you can specifiy decode_coords="all". This automatically sets the CRS to the correct version using the grid_mapping attribute of the data in the netcdf file.

>>> ds = xr.open_dataset(DATAPATH, decode_coords="all")  # decode_coords="all" sets CRS and geotransform
>>> ds
<xarray.Dataset>
Dimensions:              (x: 1264, y: 1328)
Coordinates:
    polar_stereographic  |S1 ...
  * x                    (x) float64 -3.947e+06 -3.941e+06 ... 3.947e+06
  * y                    (y) float64 -3.947e+06 -3.941e+06 ... 4.347e+06
Data variables:
    z                    (y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01
>>> ds.rio.crs()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'rasterio.crs.CRS' object is not callable
>>> ds.rio.crs
CRS.from_wkt('PROJCS["NSIDC Sea Ice Polar Stereographic South",GEOGCS["Unspecified datum based upon the Hughes 1980 ellipsoid",DATUM["Not_specified_based_on_Hughes_1980_ellipsoid",SPHEROID["Hughes 1980",6378273,298.279411123064,AUTHORITY["EPSG","7058"]],AUTHORITY["EPSG","6054"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4054"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["X",EAST],AXIS["Y",NORTH],AUTHORITY["EPSG","3412"]]')

The CRS can also be set using write_crs(). However, you have to be careful to make sure you set the correct CRS. If you use the EPSG code for the NSIDC South Polar Stereographic projection on Hugh's 1980 ellipsoid, which is 3412 from , you get the wrong CRS. GDAL uses the WGS84 version because EPSG depreciated 3412.

>>> DATAPATH = Path("/media/apbarret/andypbarrett_work/Data/AMSR_Sea_Ice_Conc/asi-AMSR2-s6250-20131104-v5.4.nc")
>>> ds = xr.open_dataset(DATAPATH)
>>> ds
<xarray.Dataset>
Dimensions:              (x: 1264, y: 1328)
Coordinates:
  * x                    (x) float64 -3.947e+06 -3.941e+06 ... 3.947e+06
  * y                    (y) float64 -3.947e+06 -3.941e+06 ... 4.347e+06
Data variables:
    polar_stereographic  |S1 ...
    z                    (y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01
>>> ds.rio.crs
# Returns None because CRS is not set
>>> >>> ds.rio.write_crs(3412, inplace=True)
<xarray.Dataset>
Dimensions:              (x: 1264, y: 1328)
Coordinates:
    polar_stereographic  int64 0
  * x                    (x) float64 -3.947e+06 -3.941e+06 ... 3.947e+06
  * y                    (y) float64 -3.947e+06 -3.941e+06 ... 4.347e+06
Data variables:
    z                    (y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01
>>> ds.rio.crs
CRS.from_epsg(3976)
>>> ds.rio.crs.to_wkt()  # Need to use to_wkt() to see CRS
'PROJCS["WGS 84 / NSIDC Sea Ice Polar Stereographic South",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",NORTH],AXIS["Northing",NORTH],AUTHORITY["EPSG","3976"]]'

Note: Setting inplace=True does not issue a warning that EPSG:3412 is being replaced with EPSG:3976. If inplace=False the following warning is issued.

>>> ds.rio.write_crs(3412)
Warning 1: CRS EPSG:3412 is deprecated. Its non-deprecated replacement EPSG:3976 will be used instead. To use the original CRS, set the OSR_USE_NON_DEPRECATED configuration option to NO.
<snip>

To ensure that the CRS is set correctly, you need to use a proj4 string or a WKT (Well Known Text). The proj4 string is in the projection definition tables here or from https://epsg.io/3412.

>>> ds.rio.write_crs("+proj=stere +lat_0=-90 +lat_ts=-70 +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378273 +b=6356889.449 +units=m +no_defs", inplace=True)
<xarray.Dataset>
Dimensions:              (x: 1264, y: 1328)
Coordinates:
  * x                    (x) float64 -3.947e+06 -3.941e+06 ... 3.947e+06
  * y                    (y) float64 -3.947e+06 -3.941e+06 ... 4.347e+06
    polar_stereographic  int64 0
Data variables:
    z                    (y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01
>>> ds.rio.crs
CRS.from_wkt('PROJCS["unknown",GEOGCS["unknown",DATUM["unknown",SPHEROID["unknown",6378273,298.279411123064]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",NORTH],AXIS["Northing",NORTH]]')

Or using the WKT, which is a more complete definition of the CRS.

>>> wkt = """
... PROJCS["NSIDC Sea Ice Polar Stereographic South (deprecated)",GEOGCS["Unspecified datum based upon the Hughes 1980 ellipsoid (deprecated)",DATUM["Not_specified_based_on_Hughes_1980_ellipsoid",SPHEROID["Hughes 1980",6378273,298.279411123064,AUTHORITY["EPSG","7058"]],AUTHORITY["EPSG","6054"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4054"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AUTHORITY["EPSG","3412"]]"""
>>> ds.rio.write_crs(wkt, inplace=True)
<xarray.Dataset>
Dimensions:              (x: 1264, y: 1328)
Coordinates:
  * x                    (x) float64 -3.947e+06 -3.941e+06 ... 3.947e+06
  * y                    (y) float64 -3.947e+06 -3.941e+06 ... 4.347e+06
    polar_stereographic  int64 0
Data variables:
    z                    (y, x) float32 ...
Attributes:
    author:       University of Bremen, Gunnar Spreen [gunnar.spreen@uni-brem...
    Conventions:  CF-1.5
    GMT_version:  5.2.1 (r15220) [64-bit] [MP]
    history:      Mon Jan 21 19:57:11 2019: GDAL CreateCopy( /ssmi/www/htdocs...
    info:         AMSR2 sea ice concentration based on the ASI algorithm (Spr...
    title:        Produced by grdmath
    GDAL:         GDAL 2.1.3, released 2017/20/01
>>> ds.rio.crs
CRS.from_epsg(3412)
>>> ds.rio.crs.to_wkt()
'PROJCS["NSIDC Sea Ice Polar Stereographic South (deprecated)",GEOGCS["Unspecified datum based upon the Hughes 1980 ellipsoid (deprecated)",DATUM["Not_specified_based_on_Hughes_1980_ellipsoid",SPHEROID["Hughes 1980",6378273,298.279411123064,AUTHORITY["EPSG","7058"]],AUTHORITY["EPSG","6054"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4054"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",NORTH],AXIS["Northing",NORTH],AUTHORITY["EPSG","3412"]]'
gkb999 commented 1 year ago

@andypbarrett Thanks a lot for the detailed explanation. The wkt format was new for me and thanks for that. So, I did this from what's suggested:

ds.rio.write_crs("EPSG:3412", inplace=True) # used inplace true before too
ds.rio.write_crs("+proj=stere +lat_0=-90 +lat_ts=-70 +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378273 +b=6356889.449 +units=m +no_defs", inplace=True)
ds.rio.crs.to_wkt() 
'PROJCS["WGS 84 / NSIDC Sea Ice Polar Stereographic South",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",NORTH],AXIS["Northing",NORTH],AUTHORITY["EPSG","3976"]]'

ds.rio.write_crs("+proj=stere +lat_0=-90 +lat_ts=-70 +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378273 +b=6356889.449 +units=m +no_defs", inplace=True)

#####
ds.rio.crs
CRS.from_wkt('PROJCS["unknown",GEOGCS["unknown",DATUM["unknown",SPHEROID["unknown",6378273,298.279411123064]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Polar_Stereographic"],PARAMETER["latitude_of_origin",-70],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",NORTH],AXIS["Northing",NORTH]]')
### reprojecting
ds_reprj = ds.rio.reproject("EPSG:4326")

Even after all these, the longitude values aren't right!!! image

The resultant looked like: image

I realised this works only on whole Antarctica. I was using the clipped Ross region's data in meters to reproject. I mean, I first clipped the data for my ROI and then was trying to reproject. My ROI: image

As suggested if I had to reproject for entire Antarctica before clipping, it produices a memory error

MemoryError                               Traceback (most recent call last)
Cell In[12], line 1
----> 1 ds_reproject = ds.rio.reproject("EPSG:4326")

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\rioxarray\raster_dataset.py:117, in RasterDataset.reproject(self, dst_crs, resolution, shape, transform, resampling, nodata, **kwargs)
    114 try:
    115     x_dim, y_dim = _get_spatial_dims(self._obj, var)
    116     resampled_dataset[var] = (
--> 117         self._obj[var]
    118         .rio.set_spatial_dims(x_dim=x_dim, y_dim=y_dim, inplace=True)
    119         .rio.reproject(
    120             dst_crs,
    121             resolution=resolution,
    122             shape=shape,
    123             transform=transform,
    124             resampling=resampling,
    125             nodata=nodata,
    126             **kwargs,
    127         )
    128     )
    129 except MissingSpatialDimensionError:
    130     if len(self._obj[var].dims) >= 2 and not get_option(
    131         SKIP_MISSING_SPATIAL_DIMS
    132     ):

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\rioxarray\raster_array.py:451, in RasterArray.reproject(self, dst_crs, resolution, shape, transform, resampling, nodata, **kwargs)
    446 dst_data = self._create_dst_data(dst_height, dst_width)
    448 dst_nodata = self._get_dst_nodata(nodata)
    450 rasterio.warp.reproject(
--> 451     source=self._obj.values,
    452     destination=dst_data,
    453     src_transform=src_affine,
    454     src_crs=self.crs,
    455     src_nodata=self.nodata,
    456     dst_transform=dst_affine,
    457     dst_crs=dst_crs,
    458     dst_nodata=dst_nodata,
    459     resampling=resampling,
    460     **kwargs,
    461 )
    462 # add necessary attributes
    463 new_attrs = _generate_attrs(self._obj, dst_nodata)

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\xarray\core\dataarray.py:727, in DataArray.values(self)
    718 @property
    719 def values(self) -> np.ndarray:
    720     """
    721     The array's data as a numpy.ndarray.
    722 
   (...)
    725     type does not support coercion like this (e.g. cupy).
    726     """
--> 727     return self.variable.values

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\xarray\core\variable.py:607, in Variable.values(self)
    604 @property
    605 def values(self):
    606     """The variable's data as a numpy.ndarray"""
--> 607     return _as_array_or_item(self._data)

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\xarray\core\variable.py:309, in _as_array_or_item(data)
    295 def _as_array_or_item(data):
    296     """Return the given values as a numpy array, or as an individual item if
    297     it's a 0d datetime64 or timedelta64 array.
    298 
   (...)
    307     TODO: remove this (replace with np.asarray) once these issues are fixed
    308     """
--> 309     data = np.asarray(data)
    310     if data.ndim == 0:
    311         if data.dtype.kind == "M":

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\dask\array\core.py:1701, in Array.__array__(self, dtype, **kwargs)
   1700 def __array__(self, dtype=None, **kwargs):
-> 1701     x = self.compute()
   1702     if dtype and x.dtype != dtype:
   1703         x = x.astype(dtype)

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\dask\base.py:314, in DaskMethodsMixin.compute(self, **kwargs)
    290 def compute(self, **kwargs):
    291     """Compute this dask collection
    292 
    293     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    312     dask.base.compute
    313     """
--> 314     (result,) = compute(self, traverse=False, **kwargs)
    315     return result

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\dask\base.py:600, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    597     postcomputes.append(x.__dask_postcompute__())
    599 results = schedule(dsk, keys, **kwargs)
--> 600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\dask\base.py:600, in <listcomp>(.0)
    597     postcomputes.append(x.__dask_postcompute__())
    599 results = schedule(dsk, keys, **kwargs)
--> 600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\dask\array\core.py:1283, in finalize(results)
   1281 while isinstance(results2, (tuple, list)):
   1282     if len(results2) > 1:
-> 1283         return concatenate3(results)
   1284     else:
   1285         results2 = results2[0]

File ~\Anaconda3\envs\xcdat_env\Lib\site-packages\dask\array\core.py:5309, in concatenate3(arrays)
   5306     except AttributeError:
   5307         return type(x)
-> 5309 result = np.empty(shape=shape, dtype=dtype(deepfirst(arrays)))
   5311 for idx, arr in zip(
   5312     slices_from_chunks(chunks), core.flatten(arrays, container=(list, tuple))
   5313 ):
   5314     if hasattr(arr, "ndim"):

MemoryError: Unable to allocate 20.6 GiB for an array with shape (3287, 1328, 1264) and data type float32

The only work around I'm sorting now is to: using data in parts to reproject rather than as a whole data at once.

Hope that was the issue before too. Thanks for all the support so far. Greatly appreciate