pytroll / pyresample

Geospatial image resampling in Python
http://pyresample.readthedocs.org
GNU Lesser General Public License v3.0
350 stars 94 forks source link

reduce_data bug with polar grids #61

Open TomLav opened 7 years ago

TomLav commented 7 years ago

Hello,

This is a bug I have been chasing for a while, but I have the impression it somehow is worse now (1.5.0) than before. It happens when using kdtree.resample* and reduce_data=True (which is the default) with polar orbiting satellites, on polar grids. At least the polar grids I am using for my work.

I prepared a script to generate test data and compare the remapping with and without reduce_data. It results in the image below.

pyresample_reduce_data_bug

To me, the reduce_data feature should only speed-up the resampling, not change the result in any way. So this is a serious flaw.

I hope this can be made rather easily into a unit test for reduce_data since it is self contained (no need for external data files to be loaded).

Where should we go from there? The github tool will not let me attach a .py. I can send it by mail. I try to paste it here but it does not look good.

import numpy as np
import pyresample as pr
import matplotlib.pylab as plt
from mpl_toolkits.basemap import Basemap
import pyproj

print "Pyresample {}".format(pr.__version__,)

# routine to create a polar area_def (NSIDC's 25km SSM/I grid)
def get_areadef(hemis):
    nx, ny = {'nh':(448,304), 'sh':(332,316)}[hemis][::-1]
    a = 25000
    if hemis == 'nh':
        xl = -3850000.00
        xL = xl+0.5*a + (nx-1)*a + 0.5*a
        yL = 5850000.00
        yl = yL-0.5*a - (ny-1)*a - 0.5*a
        pdict = {u'a': u'6378273', u'b': u'6356889.44891', u'lat_ts': u'70', u'lon_0': u'-45', u'proj': u'stere', u'lat_0': u'90', u'units':u'm'}
    elif hemis == 'sh':
        xl = -3950000.00
        xL = xl+0.5*a + (nx-1)*a + 0.5*a
        yL = +4350000.00
        yl = yL-0.5*a - (ny-1)*a - 0.5*a
        pdict = {u'a': u'6378273', u'b': u'6356889.44891', u'lat_ts': u'-70', u'lon_0': u'0', u'proj': u'stere', u'lat_0': u'-90', u'units':u'm'}
    else:
        raise ValueError("Not a valid area ('nh' and 'sh' are valid)")

    area_def  = pr.geometry.AreaDefinition('nsidc_ps_{}'.format(hemis,),
                                          'nsidc_ps_{}'.format(hemis,),
                                          'nsidc_ps_{}'.format(hemis,),
                                          pdict,
                                          nx, ny,
                                          [xl, yl, xL, yL])
    if area_def.pixel_size_x != 25000 or area_def.pixel_size_y != 25000:
        raise ValueError("The grid parameters do not match a grid spacing of 25km! (x:{} y:{})".format(area_def.pixel_size_x, area_def.pixel_size_y))

    return area_def

area_def = get_areadef('nh')

#simulate swath data:
#  1) have a single line of data going through (North) pole
slon = 125.
swath_lats_1 = np.arange(30.,90.,1.)
swath_lons_1 = np.array([slon,]*swath_lats_1.size)
swath_lats_2 = swath_lats_1[::-1]
swath_lons_2 = np.array([180+slon,]*swath_lats_2.size)

swath_lats = np.concatenate((swath_lats_1,[90.,],swath_lats_2))
swath_lons = np.concatenate((swath_lons_1,[0.,],swath_lons_2))

#  2) use pyproj to obtain x,y coordinates in the map
P = pyproj.Proj(area_def.proj4_string)
x,y = P(swath_lons, swath_lats)

#  3) use pyproj(inverse=True) to get 5 neighbouring lines that are side by side and not going through Pole. This is our simulated polar orbiting swath
nbscanpos = 5
lons = np.empty((swath_lons.size,nbscanpos))
lats = np.empty((swath_lats.size,nbscanpos))
data = np.empty((swath_lats.size,nbscanpos))
off = 100*1000  # be at least 100km from Pole
for i in range(0,nbscanpos):
    xp = x + off
    yp = y - off
    lons[:,i], lats[:,i] = P(xp,yp,inverse=True)
    data[:,i] = i+5
    off += 50*1000 # distance between scan positions is 50km

# Do 2 kdtree.resample() calls: exactly the same except for reduce_data=False|True (default is True)
swath_def = pr.geometry.SwathDefinition(lons, lats)
res_rdF = pr.kd_tree.resample_nearest(swath_def,data,area_def,radius_of_influence=50000,reduce_data=False)
res_rdT = pr.kd_tree.resample_nearest(swath_def,data,area_def,radius_of_influence=50000,reduce_data=True)

#plot
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(121)
ax.set_title('kdtree data_reduce=False')
bmap = pr.plot.area_def2basemap(area_def)
bmap.drawcoastlines()
bmap.imshow(res_rdF,interpolation='none',cmap=plt.cm.plasma, origin='upper')

ax = fig.add_subplot(122)
ax.set_title('kdtree data_reduce=True')
bmap = pr.plot.area_def2basemap(area_def)
bmap.drawcoastlines()
bmap.imshow(res_rdT,interpolation='none',cmap=plt.cm.plasma, origin='upper')

plt.savefig('./pyresample_reduce_data_bug.png',bbox_inches='tight')
print "./pyresample_reduce_data_bug.png ready"

plt.show()
mraspaud commented 7 years ago

Hi @TomLav ,

Yes, I have run into this one too. My understanding is (iirc) that the reduce_data procedure takes the edges of the swath's coordinates and computes the min an max lons and lats from there. Problem appears when the area contains a pole, because the min or max don't correspond to the reality, and significant parts of the swath are removed. My idea was to add a check for pole inclusion in the area, in which case the min and max lons and lats would be computed from the entire coordinates instead for just the edges. But I never got around to doing it.

TomLav commented 7 years ago

I am somehow doubtful that the edges/corners of the swath are the issue. I am not aware that the structure of the swath has anything to say for pyresample and I think I see the same behaviour whether I use the (scanline,scanpos) shape or after reshape(-1).

I am looking into this tonight, hopefully.

TomLav commented 7 years ago

It is late (and early!). It seems data_reduce works better with two small edits. Will have to confirm tomorrow, particularly that this does not break the tests.

<pc2981|pyresample> =:D git diff data_reduce.py
diff --git a/pyresample/data_reduce.py b/pyresample/data_reduce.py
index 76e488a..a288739 100644
--- a/pyresample/data_reduce.py
+++ b/pyresample/data_reduce.py
@@ -293,10 +293,10 @@ def _get_valid_index(lons_side1, lons_side2, lons_side3, lons_side4,
     #-360: area covers south pole
     #   0: area covers no poles
     # else: area covers both poles
-    if round(angle_sum) == 360:
+    if round(angle_sum) == -360:
         # Covers NP
         valid_index = (lats >= lat_min_buffered)
-    elif round(angle_sum) == -360:
+    elif round(angle_sum) == 360:
         # Covers SP
         valid_index = (lats <= lat_max_buffered)
     elif round(angle_sum) == 0:
mraspaud commented 7 years ago

I edited this recently I think doing the opposite of what you did. So I think we can't be sure what is north or south pole actually, it depends on the data. We could use the abs value instead, and then look which side the lats are leaning towards?

On Wed, 14 Jun 2017, 01:07 Lavergne, notifications@github.com wrote:

It is late (and early!). It seems data_reduce works better with two small edits. Will have to confirm tomorrow, particularly that this does not break the tests.

<pc2981|pyresample> =:D git diff data_reduce.py diff --git a/pyresample/data_reduce.py b/pyresample/data_reduce.py index 76e488a..a288739 100644--- a/pyresample/data_reduce.py+++ b/pyresample/data_reduce.py@@ -293,10 +293,10 @@ def _get_valid_index(lons_side1, lons_side2, lons_side3, lons_side4,

-360: area covers south pole

 #   0: area covers no poles
 # else: area covers both poles
  • if round(angle_sum) == 360:+ if round(angle_sum) == -360:

    Covers NP

     valid_index = (lats >= lat_min_buffered)-    elif round(angle_sum) == -360:+    elif round(angle_sum) == 360:
     # Covers SP
     valid_index = (lats <= lat_max_buffered)

    elif round(angle_sum) == 0:

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pytroll/pyresample/issues/61#issuecomment-308273863, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKPeiVR6NymW87E2GZW8DHnFlmcFjDXks5sDxY9gaJpZM4N4VHF .

-- Martin Raspaud, PhD Software engineer SMHI, SE-60176

TomLav commented 7 years ago

Well... that would explain why I felt 1.5.0 was doing worse than before ;) I'll try to reproduce the two behaviors with more synthetic test data.

Indeed:

git show 1c9ac493aea549a354f384059e9aa6ad41558fd8
commit 1c9ac493aea549a354f384059e9aa6ad41558fd8
Author: Martin Raspaud <martin.raspaud@smhi.se>
Date:   Thu Mar 16 16:50:57 2017 +0100

    Fix data reduction when poles are within area

diff --git a/pyresample/data_reduce.py b/pyresample/data_reduce.py
index 7fda9fa..45465c3 100644
--- a/pyresample/data_reduce.py
+++ b/pyresample/data_reduce.py
@@ -290,14 +290,14 @@ def _get_valid_index(lons_side1, lons_side2, lons_side3, lons_side4,

     # From the winding number theorem follows:
     # angle_sum possiblilities:
-    #-360: area covers north pole
-    # 360: area covers south pole
+    # 360: area covers north pole
+    #-360: area covers south pole
     #   0: area covers no poles
     # else: area covers both poles
-    if round(angle_sum) == -360:
+    if round(angle_sum) == 360:
         # Covers NP
         valid_index = (lats >= lat_min_buffered)
-    elif round(angle_sum) == 360:
+    elif round(angle_sum) == -360:
         # Covers SP
         valid_index = (lats <= lat_max_buffered)
     elif round(angle_sum) == 0:

So now we should find out test data that support your edit as well. Do you remember what type of swath it was?

TomLav commented 7 years ago

A short status for my investigations:

mraspaud commented 6 years ago

I believe #98 fixes this, can you confirm @TomLav ?

TomLav commented 6 years ago

There are two issues with data_reduce:

mraspaud commented 6 years ago

ok, good to know, thanks for the update. I'm leaving this open then.