pytroll / pyresample

Geospatial image resampling in Python
http://pyresample.readthedocs.org
GNU Lesser General Public License v3.0
352 stars 94 forks source link

EWA resampling in 1.27 slows down four times than 1.26.1 #517

Closed yukaribbba closed 1 year ago

yukaribbba commented 1 year ago

Code Sample, a minimal, complete, and verifiable piece of code

Some regular satpy codes here

area_id = "abc"
projection = some proj4 strings
area_def = create_area_def(area_id=area_id, projection=projection, resolution=250)
scn = scn.resample(area_def, resampler="ewa", weight_delta_max=10, weight_distance_max=1)

Problem description

To produce a 250m rayleigh-corrected true color scene from MODIS L1B: 1.26.1: 83.054s (with CPU almost fully loaded) 1.27: 321.692s (CPU usage around 20% all the time)

Versions of Python, package at hand and relevant dependencies

python: 3.11.3 satpy: 0.42.3-dev dask: 2023.5.0

djhoese commented 1 year ago

How did you install pyresample in this case?

What is area_def? Can you print it out? Based on what you've provided I'm guessing it is a DynamicAreaDefinition. Can you test with nearest neighbor resampling between the two versions? My guess is it isn't EWA that got slower, but loading of the lon/lat data from the swath definition.

Additionally, how many dask workers are you using? Are you customizing your dask array chunk size at all or any other dask-related options?

djhoese commented 1 year ago

For my own reference (and yours if you're curious), here are the diffs between the two versions and specifically looking at geometry.py (no EWA files were changed in this last release):

v1.26.1...v1.27.0#diff-631d3c6d4c

But the majority of the work between these two releases was:

  1. New documentation
  2. Add better warning messages or remove unnecessary warning messages
  3. Remove deprecated code

The only thing inside pyresample that I can see affecting your EWA usage is that for number 2 above I switched to the newer pyproj Transformer interface instead of using the deprecated/old Proj interface. I'd be very surprised if this broke something though as it is supposed to result in the same low-level code being used.

yukaribbba commented 1 year ago

I'm on my phone now so I'll test Nearest tomorrow. I installed pyresample as a dependency for satpy. Days ago I updated my packages then I found something not right so today I rolled back to 1.26.1. Beside that rollback, I didn't change any other packages or environment variables -- my PYTOLL_CHUNK_SIZE=2200, DASK_NUM_WORKS=8 and OMP_NUM_THREADS=8. That areadef is a dynamic one yes. I leave the area extents for pyresample to calculate itself.

yukaribbba commented 1 year ago

I tested the same scene under same env.

Nearest: 1.27: 127s 1.26.1: 124s (CPU fully loaded)

Bilinear: After consuming almost all my 64GB memory and a lot of time I have to stop it. (Happened in both versions)

djhoese commented 1 year ago

Thanks for testing that. I had a busy day and ran out of time to test this myself but it's on my TODO list. If it really is the same environment (same versions of dask and xarray) then it must be either Cython difference when the package was built, conda-forge build change, or a weird side effect of something changed in pyresample. At least these are my guesses right now.

augustinh22 commented 1 year ago

Not directly related, but I also pinned a project to v1.26 due to Cython dependency changes in 1.27 that complicated a docker setup.

djhoese commented 1 year ago

@augustinh22 Do you have any extra details on that? What do you mean?

djhoese commented 1 year ago

@yukaribbba did you install Satpy/pyresample via pip or with conda?

yukaribbba commented 1 year ago

Conda. I use miniforge.

djhoese commented 1 year ago

@yukaribbba What operating system are you on?

djhoese commented 1 year ago

For my own future reference, the last build of pyresample on conda-forge for Python 3.11 used numpy 1.23.5 (other python versions use numpy 1.21). Cython version used was 0.29.34 (also from conda-forge). The build for 1.26.1 used Cython 0.29.33 and the exact same build of numpy as the 1.27 release.

yukaribbba commented 1 year ago

Windows 11 22h2

djhoese commented 1 year ago

Running on main currently, here's my test script:

import os
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["PYTROLL_CHUNK_SIZE"] = "6400"

from satpy import Scene
from glob import glob
from pyresample import create_area_def
import dask

dask.config.set(num_workers=4)

scn = Scene(reader="viirs_sdr",
            filenames=(glob("/data/satellite/viirs/conus_day/SVI04*t180[1234]*.h5") +
                       glob("/data/satellite/viirs/conus_day/GITCO*t180[1234]*.h5")))
scn.load(["I04"])
area_def = create_area_def("test", {"proj": "eqc"}, resolution=1000)
new_scn = scn.resample(area_def, resampler="ewa", weight_delta_max=40.0, weight_distance_max=2.0)
_ = new_scn["I04"].compute(

Running on Python 3.10 conda-based environment. Installed pyresample from source with pip install -e . --no-deps then manually did python setup.py build_ext --inplace to make sure the extensions were built with my installed version of Cython. For my first run they were built with cython 0.29.34 (same as the 1.27 conda-forge package). I then ran the above python code with Pycharm Pro's Profiling which took a total of 9.697s on my laptop.

I then moved the generate .so files, the generated cython _ll2cr.c and _fornav.cpp files to a new directory ext_cy02934 to be sure they were gone. With Cython 0.29.33 (same as the 1.26.1 conda-forge build) I get 7.485s for this script. So a 2s difference just from Cython version...not great.

I then upgraded Cython 0.29.35, a version after the 1.27 conda-forge build, and it still takes 9.635s.

This is not going to be fun.

djhoese commented 1 year ago

Oops the version I started with was actually Cython 0.29.35, not 0.29.34. 0.29.34 took 7.8s so decently close to the 0.29.33 version. Now to try with 0.26.1 of pyresample.

djhoese commented 1 year ago

Pyresample 0.26.1 locally | Cython 0.29.34 | 7.105s

I diffed the two cython generated C files and they are exactly the same. So it suggests that maybe my test is not very representative of the issue. Let's try mixing in some other bands.

Just for the record the time for the 0.26.1 code with 0.29.33 was 7.467s.

djhoese commented 1 year ago

I bumped my script up with more granules and 5 I-bands for the VIIRS data I'm reading. I get ~20.5s of execution time for both. I'm not seeing this 4x slowdown, but I'll keep trying things. I'll try installing directly from conda-forge.

djhoese commented 1 year ago

Using the conda packages with my updated script I get 27.0s for pyresample 0.26.1 and 24.8s for pyresample 0.27.0 so it is actually faster in the new version. Granted I'm not investigating dask diagnostic plots to see CPU utilization during these executions, but given the performance difference I'm not that concerned.

So our next path of debugging: @yukaribbba you said you're on Windows and Python 3.11. @augustinh22 you mentioned docker images so I'm guessing you're not using Windows as your container's OS?

I'm using dask 2023.5.0 which is what you said you're using. The only differences now are Python version, but I'd be surprised if that made a huge difference in performance inside Cython code. Maybe it is worth a shot but I don't have time for it now. Otherwise, it is a Windows only issue. Or something about your environment? I guess I could also try a larger (higher resolution) target area. :shrug:

Sorry to the people who are running into issues, but I need more info to get a reproducible failure.

Edit: Or point out where I'm doing something wrong.

djhoese commented 1 year ago

Sorry for all the messages. I increased the size of my area by setting the resolution to 250m. With conda-forge 0.26.1 I get 62s. With 0.27.0 I get 113s. This is still Python 3.10.

djhoese commented 1 year ago

Nevermind, I was looking at the wrong time. The 62s was actually supposed to be 113s. So still not able to reproduce this.

yukaribbba commented 1 year ago

Otherwise, it is a Windows only issue.

I'll try on a Ubuntu VM.

yukaribbba commented 1 year ago

The problem remains in my Linux VM. The environment is:

1.26.1: 105s 1.27: 292s

yukaribbba commented 1 year ago

Maybe you need to try building the env directly from conda just like me, not using cython locally?

djhoese commented 1 year ago

Very interesting.

Maybe you need to try building the env directly from conda just like me, not using cython locally?

I tried both. I started with an existing conda-based (conda-forge packages) environment and uninstalled pyresample. I installed it from conda-forge and built it locally with different cythons.

I'll put together a Python 3.11 environment together and try that. If I can't reproduce it then then I'll try MODIS data instead of VIIRS data. If that still doesn't reproduce it then I'll try your exact chunk size and the true_color composite. Your chunk size is not necessarily optimal (I think) for MODIS data, but maybe I'm misremembering something.

Note that you setting OMP_NUM_THREADS=8 is likely not helping you at all and could actually be hurting you depending on what OpenMP libraries end up getting used in your processing.

djhoese commented 1 year ago

Just discovered, there is a chance that PyCharm was pulling in my local dev version of pyresample when I ran my timing script even when I wanted it to use the conda package of pyresample. I still would have expected differences with my Cython rebuilds, but yeah, I just wanted to point this out.

djhoese commented 1 year ago

I think I reproduced it! Python 3.11 may be the primary issue. Pyresample 1.26.1 took 62.9s, 1.27.0 took 116.5s. Now to see what happens with a local development version.

yukaribbba commented 1 year ago

For my experience it also occurs in VIIRS SDR. I also tried different composites, true-color, overview, true/false sunz-corrected, true/false rayleigh-corrected, none of them survived.

I actually have no idea what's that OMP meaning just because I saw the satpy documents said so.:sweat_smile: Maybe I need to disable that.

Is there a debug switch just like satpy in pyresample or dask?

djhoese commented 1 year ago

What do you mean by debug switch? The debug_on? That's actually just a convenience for turning on debug logging. I think if instead of debug_on you could cheat and do:

import logging
logging.basicConfig(level=logging.DEBUG)

That says that all loggers should print out DEBUG level output.

yukaribbba commented 1 year ago

Nevermind...Since you've already reproduced the issue.

djhoese commented 1 year ago
main 0.29.33 112s
main 0.29.34 113s
0.26.1 (local) 0.29.34 62.8s

Ok so now to walk through pull requests until I figure out what happened.

djhoese commented 1 year ago

It looks like it was this comparison added by @mraspaud here:

https://github.com/pytroll/pyresample/pull/508/files#diff-0275305df517377d471ba5034726f76701f86ee539d49547684040bbf78fd1a6R124-R125

So PR #508. I have to go offline now but given that this is the issue my guess is that it is comparison the lon/lats of all of the swath definitions. It should likely be doing an is identity check for dask arrays. I'm not sure why it isn't.

djhoese commented 1 year ago

Oh but also I only saw this on Python 3.11 so it must be some optimization that Python 3.11 is making that is skipping over some of our shortcuts. This could even be a problem with dask in Python 3.11. It should be easy enough to check I just need to have the time. I'll try later today.

djhoese commented 1 year ago

Ok I don't know what I was doing with my Python 3.10 environment yesterday, but I'm able to comment out this if statement in Python 3.10 and 3.11 and get the difference in performance (better with the if statement comments out).

@mraspaud we need to think of a better way to do if source_def == target_def. For the current case, the SwathDefinition.__eq__ method is called, it gets the lons/lats from the AreaDefinition which produces numpy arrays. Then it doesn't get caught in any of the shortcuts (is identity checks, .name dask array checks, etc) so it goes all the way to np.allclose.

However, I'm still not sure why @yukaribbba didn't see performance differences in nearest neighbor since I would have expected this to result in the same thing.

djhoese commented 1 year ago

Ah the nearest neighbor resampler doesn't use this base class. So my guess is this if statement has slowed every resampler class that is based on this base class.

Edit: Swath-compatible gradient_search is also 50s slow, but it isn't dask optimized so we're talking 4 minutes versus almost 5 minutes of run time.

djhoese commented 1 year ago

@yukaribbba an initial fix is implemented in https://github.com/pytroll/pyresample/pull/520. If you're able to test it it would be much appreciated.

yukaribbba commented 1 year ago

Thanks! That was quick. I'm offline now so will test it tomorrow.

yukaribbba commented 1 year ago

83s, back to normal performance! 😆

yukaribbba commented 1 year ago

There's another question here. I remembered in P2G 2.3 the resolution of an output image(EWA resampled) was exactly the same as I set in grids.conf. I'm not sure which version of pyresample it used. But the recent versions(1.26.1, 1.27, this PR) will bring up something like res_x=375.018139808887099 / res_y=-375.094414093355681 when I set the resolution to 375m. This will cause some problems when I do some mosaic operations.

djhoese commented 1 year ago

P2G now uses pyresample's create_area_def utility function for all grid/area creation. If you're creating static grids then the resolutions (I think) are manipulated a bit to have whole pixel sizes. You'd have to adjust the number of pixels or extents so that you have nice even numbers of pixels. If this is a dynamic area then there may be other reasons. Could you create an issue on the polar2grid repository and include an example of the grid definition you are using?

yukaribbba commented 1 year ago

It's a dynamic area. I'm using satpy with pyresample under windows now, but also tried P2G 3.0 in a virtual machine in the past. Both of them have this issue so I guess it may not be a P2G issue. For the grid definition, it's just a normal one in grids.conf, with a PROJ4 string and x/y resolutions.

djhoese commented 1 year ago

It's not really an "issue" depending on how you define your area. If you had a 2x2 pixel grid that you wanted at 100m resolution but also wanted it to have extents that are 300m apart (instead of 200m) then one of those values has to change. Can you paste your definition here and I can take a quick look? I guess I would have expected resolution to take priority in the dynamic area case.

djhoese commented 1 year ago

I just noticed you mention grids.conf. You may benefit from using the grids.yaml format used in P2G 3.0. I'd still like more details on which grids you're talking about though. If the projection is a lon/lat one and the resolution is specified in meters (or vice versa) then the units are converted when the area is created which can definitely add some floating point precision issues and is overall an estimate.

yukaribbba commented 1 year ago

I got a comparison here.

P2G 2.3 command line polar2grid.sh crefl gtiff --true-color --fornav-D 10 --fornav-d 1 --compress NONE --grid-configs /home/ll/grids.conf -g lcc_cn_250 -f /mnt/hgfs/Downloads/Sat/Polar/terra_2000360_0420/ grids config

# grid_name,            proj4, proj4_str,                                                                                width,  height, pixel_size_x, pixel_size_y,           origin_x,          origin_y
lcc_cn_250,             proj4, +proj=lcc +datum=WGS84 +ellps=WGS84 +lat_1=25 +lat_2=47 +lon_0=105 +units=m +no_defs,      None,    None,          250,         -250,               None,              None

output

Size is 11292, 9806
Pixel Size = (250.000000000000000,-250.000000000000000)
Upper Left  (-1977413.430, 4596402.652) ( 81d56'30.87"E, 37d43'23.64"N)
Lower Left  (-1977413.430, 2144902.652) ( 87d 8'13.43"E, 16d 4'57.43"N)
Upper Right (  845586.570, 4596402.652) (115d 0'53.64"E, 39d29'25.04"N)
Lower Right (  845586.570, 2144902.652) (112d42'36.22"E, 17d22'58.13"N)
Center      ( -565913.430, 3370652.652) ( 99d 9'36.23"E, 28d30'12.57"N)

400% enlarged image

satpy with pyresample(this PR) (environment mentioned above) code

# using GenericCompositor just as P2G 2.3
files = find_files_and_readers(base_dir=folder, reader="modis_l1b")
scn = Scene(filenames=files, reader_kwargs={'mask_saturated': False})
scn.load(["true_color"])
area_id = "lcc_cn_250"
projection = "+proj=lcc +lon_0=105 +lat_1=25 +lat_2=47 +datum=WGS84 +ellps=WGS84"
res = 250
area_def = create_area_def(area_id=area_id, projection=projection, resolution=res)
scn = scn.resample(area_def, resampler="ewa", weight_delta_max=10, weight_distance_max=1)

output

Size is 11308, 9811
Pixel Size = (249.995323508576064,-249.988563077171307)
Upper Left  (-1977413.430, 4596403.063) ( 81d56'30.86"E, 37d43'23.65"N)
Lower Left  (-1977413.430, 2143765.270) ( 87d 8'20.18"E, 16d 4'22.48"N)
Upper Right (  849533.688, 4596403.063) (115d 3'40.73"E, 39d29'11.59"N)
Lower Right (  849533.688, 2143765.270) (112d44'42.25"E, 17d22'12.53"N)
Center      ( -563939.871, 3370084.166) ( 99d10'50.64"E, 28d29'57.85"N)

400% enlarged image

Well you can clearly see that 2.3 has better details and the resolution is just clean.

djhoese commented 1 year ago

Oh very interesting. I agree. This is bad. Let me see if I can decipher the code.

djhoese commented 1 year ago

My test code (for my own reference):

from pyresample.test.utils import create_test_longitude, create_test_latitude
from pyresample import DynamicAreaDefinition

area_def = DynamicAreaDefinition(projection="+proj=lcc +datum=WGS84 +ellps=WGS84 +lat_1=25 +lat_2=47 +lon_0=105 +units=m +no_defs", resolution=(250.0, 250.0))
frozen_area_def = area_def.freeze(lonslats=(create_test_longitude(125.0, 135.0, (200, 100), twist_factor=0.1), create_test_latitude(25.0, 35.0, (200, 100))))

frozen_area_def.pixel_size_x
# 249.9988581543627

frozen_area_def.pixel_size_y
# 249.9987629440828

Our/my goal is to get this resolution closer to the specified resolution. Granted it is pretty close but lets see if I can make it better. The main difficulty is that in P2G 2.3, the internal "Grid definition" objects used pixel resolution as a primary property while AreaDefinitions use the extents.

Edit: Changed twist_factor to 0.1 which gives a more reasonable result with shape (8061, 9093) and:

In [24]: frozen_area_def.pixel_size_x
Out[24]: 249.9906672368728

In [25]: frozen_area_def.pixel_size_y
Out[25]: 250.00687632884492
djhoese commented 1 year ago

@yukaribbba Check out #523. If you notice the 2 line change in geometry.py, you should be able to hack this solution in your own copy of the code and test if it fixes your use case. Let me know how it goes.

djhoese commented 1 year ago

Reopening just to keep track of the resolution issue too.

yukaribbba commented 1 year ago
Size is 11308, 9811
Pixel Size = (250.000000000000000,-250.000000000000057)
Upper Left  (-1977413.430, 4596515.270) ( 81d56'29.76"E, 37d43'27.26"N)
Lower Left  (-1977413.430, 2143765.270) ( 87d 8'20.18"E, 16d 4'22.48"N)
Upper Right (  849586.570, 4596515.270) (115d 3'43.46"E, 39d29'15.09"N)
Lower Right (  849586.570, 2143765.270) (112d44'43.98"E, 17d22'12.40"N)
Center      ( -563913.430, 3370140.270) ( 99d10'51.49"E, 28d29'59.74"N)

This is the same scene I mentioned above. It's quite quite close...I guess that ".57" is some kind of float issues but I haven't seen such closs call in P2G 2.3. I also tested two other adjacent scenes. They're both at 250m. No float issues. So I think it's a good result, maybe?

djhoese commented 1 year ago

Yeah, definitely a floating point issue. I'm not sure I can do too much about it, but maybe @mraspaud will have ideas in my PR. At that many decimals out we're beyond 32-bit precision and at the end of 64-bit precision I think.

As you can see in my PR I'm taking the left-most-extent + (width - 1) * x_resolution where width is an integer and x_resolution is the floating point number the user provided. The extent is calculated from the provided lon/lats. I suppose those could be rounded to the nearest meter, but we'd have to be careful not to do that in the degrees case. We could always round the number to the nearest 32-bit float I suppose just to avoid unnecessary precision weirdness but that seems like a hack. Any other ideas?

yukaribbba commented 1 year ago

Ok. For me it's acceptable but I'm still curious why 2.3 doesn't have this issue. Maybe it does some round operations like you just said?

djhoese commented 1 year ago

No, it is that P2G 2.3 doesn't use pyresample's AreaDefinitions. The GridDefinition object used in P2G 2.3 holds on to the pixel resolution as one of the primary properties of the grid (projection, resolution, shape, upper-left origin). An AreaDefinition holds on to (projection, extents, shape). So in the area definition case the pixel resolution is calculated and then passed to GDAL/rasterio when it makes the geotiff. In the P2G v2.3 grid definition case it just passed the resolution that was explicitly provided by the user.