scworland / restore-2018

scripts for predicting streamflow characteristics in ungaged basins for RESTORE
4 stars 2 forks source link

COMIDs Too Small to Be Represented #31

Open ghost opened 6 years ago

ghost commented 6 years ago

The idea of small needs statement in relation to applicability of COMID predictions.

FDC <- read_feather(file.choose()) # "all_gage_data.feather"

FDC <- aggregate(FDC, by=list(FDC$comid), mean)

summary(FDC$acc_basin_area)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
     0.32    210.41    725.00   6092.09   2797.24 171742.38 

plot(lmomco::pp(FDC$acc_basin_area), sort(FDC$acc_basin_area), log="y")

 length(FDC$acc_basin_area)
[1] 952

screen shot 2018-05-01 at 9 45 39 am FIGURE: Empirical distribution of area for COMIDs for which a gage exists. Note that the break at about Z=-2.5 , which is about 6.5 square kilometers.

COVAR <- read_feather(file.choose()) # all_huc12_covariates.feather
N <- aggregate(COVAR, by=list(COVAR$comid), mean)

N <- N[! is.na(N$acc_basin_area),];  N <- N[N$acc_basin_area > 0,]

 length(N$comid)
[1] 9302

summary(N$acc_basin_area)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
      0.0      89.1     194.3   11346.0     756.1 2873761.6 

screen shot 2018-05-01 at 9 41 09 am FIGURE: Empirical distribution of area for COMIDs for which prediction is to occur. Note that the break at about Z=-2 , which is about 30 square kilometers.

Though these statistics might change slightly as we finally get locked down on the COMIDs, we must provide a caveat like this.

Given that summary statistics are already mentioned in a text:

Recalling that the about 9,000 COMIDs for prediction in this study are coincident (representative) of the outlets (pour points) of the HUC12s, it is important to stress that there is a finite smallest resolution of accumulated area size as embodied in the acc_basin_area attribute. The distribution of these accumulated areas is visually highly non-lognormal (results not shown here). The size of the accumulated area drops off very rapidly from about 30 square kilometers to about 0.01 square kilometers at about the 2nd percentile. In other words, there are relatively few accumulated areas less than about 30 square kilometers, and as contrast the upper 2nd percentile accumulated area is about 60,000 square kilometers.

In contrast, the statistical model construction used about 950 unique accumulated areas nearly log-normal (results not shown here) with a 2nd percentile of about 15 square kilometers and an upper 2nd percentile of about 87,000 square kilometers. These tails can be thought of as encompassing the range of the COMIDs for prediction for all except the smallest watersheds. There is a break or hinge point at about 5.2 square kilometers (2 square miles [round]). As a lower limit (truncation point) for this study and its predictions (but not model construction), a lower bounds on acc_basin_area was set at 5.2 square kilometers.