pacificclimate / climate-explorer-backend

3 stars 1 forks source link

Discrepenceis between watershed size reported by RVIC and /watershed API #161

Open corviday opened 3 years ago

corviday commented 3 years ago

@sum1lim has found some discrepancies between the number of upstream cells as reported by RVIC and the number reported by the /watershed API.

Lat Lon Upstream cells Watershed API Upstream cells RVIC
-120.8 56.2 1 1
-117.6 55.6 10 10
-121.4 56.1 50 50
-120.3 56 140 2
-121.7 56.4 282 282
-121.6 56.3 360 360
-120.9 56.1 486 486
-120.5 56.2 631 631
-117.7 55.5 1351 2
-117.4 56.1 1948 1941
-122 56 2867 2824
-121.9 56 2868 2752
-121.8 56.1 2900 2893
-121.7 56.1 2925 2918
-121.3 56.2 3300 3293
-121.1 56.2 3342 3903
-120.5 56.1 3921 3335
-120.3 56.1 4558 4550
-120.1 56.1 4708 4700
-118.9 56.1 5104 5096
-118.6 55.9 5248 5140
-118.5 55.9 5256 5247
-118.3 55.9 5264 5256
-118.2 55.9 5266 5258
-118 55.9 5363 5266
-117.8 56 5387 5262
-117.5 56.1 5404 5393

Possibilities:

corviday commented 3 years ago

I have checked to make sure the file that @sum1lim was using for flow direction and the file used by the watershed API for flow direction have the same values. They do.

corviday commented 3 years ago

I'm checking the point at longitude -120.3° latitude 56°. The Watershed API thinks this is a watershed of 140 grid squares; RVIC thinks it is 2 squares.

Here's what the /watershed API thinks the extent of the watershed is. It's generally a reasonably looking watershed: the Kiskatinaw river and tributaries, up to a little bit before it joins the Peace River.

json+point

But look at the location of the pin indicating the point that was asked about. Note that this map is so zoomed in the scale is in meters. The selected point is exactly on the boundary.

boundary

I think the issue here is that the /watershed API and RVIC are using different rounding algorithms for which square a point is assigned to when it is exactly on the boundary.

Here's the latitude values for the Peace River Flow Direction netCDF:

ncdump -v lat /storage/data/projects/comp_support/climate_explorer_data_prep/hydro/peace_watershed/time-invariant/flow-direction_peace.nc
netcdf flow-direction_peace {
dimensions:
    lon = 192 ;
    lat = 83 ;
variables:
    double lon(lon) ;
    double lat(lat) ;
        lat:standard_name = "latitude" ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
    double flow_direction(lat, lon) ;

// global attributes:
    ...

data:

 lat = 53.03125, 53.09375, 53.15625, 53.21875, 53.28125, 53.34375, 53.40625, 
    53.46875, 53.53125, 53.59375, 53.65625, 53.71875, 53.78125, 53.84375, 
    53.90625, 53.96875, 54.03125, 54.09375, 54.15625, 54.21875, 54.28125, 
    54.34375, 54.40625, 54.46875, 54.53125, 54.59375, 54.65625, 54.71875, 
    54.78125, 54.84375, 54.90625, 54.96875, 55.03125, 55.09375, 55.15625, 
    55.21875, 55.28125, 55.34375, 55.40625, 55.46875, 55.53125, 55.59375, 
    55.65625, 55.71875, 55.78125, 55.84375, 55.90625, 55.96875, 56.03125, 
    56.09375, 56.15625, 56.21875, 56.28125, 56.34375, 56.40625, 56.46875, 
    56.53125, 56.59375, 56.65625, 56.71875, 56.78125, 56.84375, 56.90625, 
    56.96875, 57.03125, 57.09375, 57.15625, 57.21875, 57.28125, 57.34375, 
    57.40625, 57.46875, 57.53125, 57.59375, 57.65625, 57.71875, 57.78125, 
    57.84375, 57.90625, 57.96875, 58.03125, 58.09375, 58.15625 ;
}

56 degrees latitude is exactly halfway between 55.96875 and 56.03125. /watershed is assigning this point to the grid cell containing 56.03125, which contains the Kiskatinaw river (140 grid cell watershed). My suspicion is that RVIC is assigning it to the grid cell containing 55.96875, which does not contain a river (2 grid cell watershed).

sum1lim commented 3 years ago

Hmm, I see the point. But there were still some points with a small discrepancy such as (-117.28125, 56.21875) where RVIC thinks it has 7469 upstream grid cells and Watershed API thinks it is 7484. I don't know if it is caused from the same reason.

corviday commented 3 years ago

Hmmmm. It might relate to cells at the edge of the watershed, I think there's some way in a domain file to indicate that a cell is only partly inside the watershed. RVIC might use those fractions to calculate, the area, or it may skip the cells only partly in the watershed. /watershed doesn't do anything like that - either a cell is inside the watershed or it isn't. I'll see if I can turn up anything.

jameshiebert commented 3 years ago

I'd propose that a simple resolution to this issue would consist of not allowing requests on cell boundaries. E.g. if the incoming request coordinate falls on a cell boundary, respond with an HTTP 400 Bad Request error (and an explanatory message).

corviday commented 3 years ago

My first idea about the small differences between RVIC and the watershed API that you mentioned for the whole-watershed value didn't pan out. It would be nice to have more examples. @sum1lim, if you run across any other points that aren't on grid bondaires but don't have the same watershed size, please let me know.

sum1lim commented 3 years ago

@corviday Thanks for investigating on the issue! Following are the locations that show the discrepancies I found until now:

Basin Code | Station ID | Station Name | Lat | Lon | Watershed API | RVIC BRNFS | 07FC001 | Beatton River near Fort St. John | -120.71875 | 56.28125 | 623 | 622 FRAAR | 07EA005 | Finlay River above Akie River | -125.28125 | 57.15625 | 669 | 666 LSRNG | 07GH002 | Little Smoky River near Guy, AB | -117.15625 | 55.46875 | 433 | 431 ORAOR | 07EC002 | Omineca River above Osilinka River | -124.59375 | 55.90625 | 222 | 219 PERNT | 07FD002 | Peace River near Taylor | -120.65625 | 56.15625 | 3910 | 3836 PRABD | BCGMS | Peace River at Bennett Dam | -122.21875 | 56.03125 | 2836 | 2829 PRAOO | 07EE010 | Pack River at Outlet of McLeod Lake | -123.03125 | 54.96875 | 149 | 148 PRAPR | 07HA001 | Peace River at Peace River, AB | -117.28125 | 56.21875 | 7484 | 7469 SMRAW | 07GJ001 | Smoky River at Watino, AB | -117.59375 | 55.78125 | 1907 | 1900

Lat | Lon | Watershed API | RVIC -120.3 | 56 | 140 | 2 -117.7 | 55.5 | 1351 | 2 -117.4 | 56.1 | 1948 | 1941 -122 | 56 | 2867 | 2824 -121.9 | 56 | 2868 | 2752 -121.8 | 56.1 | 2900 | 2893 -121.7 | 56.1 | 2925 | 2918 -121.3 | 56.2 | 3300 | 3293 -121.1 | 56.2 | 3342 | 3903 -120.5 | 56.1 | 3921 | 3335 -120.3 | 56.1 | 4558 | 4550 -120.1 | 56.1 | 4708 | 4700 -118.9 | 56.1 | 5104 | 5096 -118.6 | 55.9 | 5248 | 5140 -118.5 | 55.9 | 5256 | 5247 -118.3 | 55.9 | 5264 | 5256 -118.2 | 55.9 | 5266 | 5258 -118 | 55.9 | 5363 | 5266 -117.8 | 56 | 5387 | 5262 -117.5 | 56.1 | 5404 | 5393

sum1lim commented 3 years ago

@corviday , it looks like the small discrepancies only happen on every point with number of upstream grid cells larger than 1000. Points with a number less than 1000 did not show the discrepancies. I cannot confirm this is true, but my observations from the report follow the rule.

Edit: Oh never mind, there are discrepancies between watershed sizes 623 vs 622

Watershed API RVIC
71 71
67 67
623 622
669 666
84 84
83 83
155 155
361 361
163 163
128 128
104 104
433 431
118 118
90 90
51 51
204 204
179 179
271 271
85 85
222 219
69 69
119 119
3910 3836
2836 2829
444 444
174 174
149 148
7484 7469
47 47
1907 1900
103 103
417 417
41 41
Watershed API RVIC
1 1
10 10
50 50
282 282
360 360
486 486
631 631
1948 1941
2867 2824
2868 2752
2900 2893
2925 2918
3300 3293
3342 3903
3921 3335
4558 4550
4708 4700
5104 5096
5248 5140
5256 5247
5264 5256
5266 5258
5363 5266
5387 5262
5404 5393