pytroll / pyresample

Geospatial image resampling in Python
http://pyresample.readthedocs.org
GNU Lesser General Public License v3.0
343 stars 95 forks source link

YAML area configuration does not allow to specify the dtype and dtype is ignored in equality comparisons #590

Open gerritholl opened 4 months ago

gerritholl commented 4 months ago

Today I learned that an areadefinition has a dtype attribute. This seems to be not very well documented and not very well supported, but it's relevant. It is used at least in get_lonlats(), which is used in resampling, so changing the dtype can have considerable memory implications and make the difference between having enough RAM or not.

Code Sample, a minimal, complete, and verifiable piece of code

import pyresample

ar = pyresample.AreaDefinition(
        "area_id", "descr", "proj_id", 4326, 100, 100,
       [-100, -100, 100, 100], dtype="float32")
s = ar.dump()
print(s)
ar2 = pyresample.area_config.load_area_from_string(s)
print("equals?", ar == ar2)
print("dtypes", ar.dtype, ar2.dtype)
s2 = """area_id:
  description: descr
  projection:
    EPSG: 4326
  shape:
    height: 100
    width: 100
  area_extent:
    lower_left_xy: [-100, -100]
    upper_right_xy: [100, 100]
  dtype: float32
"""
ar3 = pyresample.area_config.load_area_from_string(s2)
print(ar3.dtype)

Problem description

Executing the code reveals several problems:

It does not appear to be possible to specify the dtype in the YAML configuration (nor in pyresample.create_area_def()).

Expected Output

area_id:
  description: descr
  projection:
    EPSG: 4326
  shape:
    height: 100
    width: 100
  area_extent:
    lower_left_xy: [-100, -100]
    upper_right_xy: [100, 100]
  dtype: float32

equals? True
dtypes float32 <class 'numpy.float32'>
<class 'numpy.float32'>

Actual Result, Traceback if applicable

area_id:
  description: descr
  projection:
    EPSG: 4326
  shape:
    height: 100
    width: 100
  area_extent:
    lower_left_xy: [-100, -100]
    upper_right_xy: [100, 100]

equals? True
dtypes float32 <class 'numpy.float64'>
/data/gholl/checkouts/pyresample/pyresample/area_config.py:91: UserWarning: Unused/unexpected area definition parameter(s) for area_id: params={'dtype': 'float32'}
  area_list = parse_area_file(area_file_name, *regions)
<class 'numpy.float64'>

Versions of Python, package at hand and relevant dependencies

pyresample main (v1.28.2-2-g711f354)

djhoese commented 4 months ago

I would consider dtype in the init method deprecated. At least it will not be included in future versions of the area definition class or at least not the ones I have planned.

gerritholl commented 4 months ago

How would you recommend user controlling the dtype when resampling? Keyword argument to resample?

djhoese commented 4 months ago

I'm tempted to say resampling should always use 64-bit floats for projected areas and optionally/possibly do 32-bit floats for lon/lats (degrees) coordinate systems.

gerritholl commented 4 months ago

I run out of memory with 64-bit floats, but my code completes fine when I force them to be 32 bits (projected area).

djhoese commented 4 months ago

Are the lon/lats being forced to 32-bit in your changes or the x/y coordinates? We've (Panu and me at least) discussed in the past on slack that 64-bit is required for x/y meter accuracy in most projections, but is way overkill for lon/lat degrees. At least that's my memory.

djhoese commented 4 months ago

Also, if this is running out of memory (OOM) with the "nearest" resampler then it is likely the generation of the KDTree that is the major contributor to the memory usage as the entire thing has to exist in memory at once (no dask chunking). The KDTree involves 3 axes (x, y, z) of geocentric coordinates...but I thought that was always 64-bit so I might be wrong about the dtype of the lon/lats contributing to this.