scworland / restore-2018

scripts for predicting streamflow characteristics in ungaged basins for RESTORE
4 stars 2 forks source link

STANDARD OF PRACTICE (on-the-fly "flood_storage"): all_gage_data.feather and all_huc12_covariates.feather #29

Open ghost opened 6 years ago

ghost commented 6 years ago

MESSAGE: A flood storage capacity in excess of about 0.52, which is about 3.3 ft of cumulative watershed storage attributable to flood capacity in upstream reservoirs. That is a big number because 3.3ft is 39.6 inches, which is like total mean annual precipitation for much of our study area that resides in Texas.

From all_gage_data.feather:

screen shot 2018-03-19 at 11 11 21 am EMPIRICAL DISTRIBUTION OF FLOOD_STORAGE AS SEEN INSIDE THE all_gage_data.feather. Maximum is 0.5255755 ....

Now, concerning all_huc12_covariates.feather:

I am using a flood storage per unit area term as the alteration via the NID. Reservoirs don't destroy volume per se but timing, so we might see its utility for FDC shapes but perhaps less so for the mean and median. The computations below are those also I use to build the provisional GAMs.

The storages of the covariates file are in acre-feet, and 1 km2 = 247.104393047 acres.

First, compute the storage capacity spCOV$flood_storage <- spCOV$acc_nid_storage - spCOV$acc_norm_storage

Second, divide by area spCOV$flood_storage <- spCOV$flood_storage/(spCOV$acc_basin_area*247.104393047)

Third, as a trap for division by zeros or otherwise missing, lets assume zero spCOV$flood_storage[is.na(spCOV$flood_storage)] <- 0

Fourth, let us do a sanity check. spCOV[spCOV$flood_storage < 0,];

and yep, there are some inverted relations here and there (all Florida it seems) plot(spCOV); plot(spCOV[spCOV$flood_storage < 0,], add=TRUE, col=2)

Fifth, fix the negative by assuming the entries are numerically correct but flipped in their slot in the database. spCOV$flood_storage <- abs(spCOV$flood_storage)

Sixth, transform to log but we have zeros, so do a small log-offset and let the GAM deal with lingering curvature. (Remember that log-offsets change the curvature in log-log plots.) spCOV$flood_storage <- log10(spCOV$flood_storage+.01)

Seventh, let us look at the overall empirical distribution and see how bad the right tail might get. plot(qnorm(pp(spCOV$flood_storage)), sort(spCOV$flood_storage), type="l") lines(qnorm(pp(DD$flood_storage)), sort(DD$flood_storage), col=2)

Eighth, we we some phenomenally large flood storages. Let us truncate them back at this point by removal. Further discussion and inspection might be needed, but right now the values used to build the GAMs limit out at about 0.52, which is an incredible 3.3 feet of watershed depth equivalent in of itself. length(spCOV$comid[spCOV$flood_storage >= 0.52]) [1] 35 So we have 35 COMIDs with "large" storage. Here are the summary statistics: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5228 0.5546 0.7193 0.7827 0.8425 1.5351 So there is debate how much beyond the observational data that we could go but the mean is 61 inches or so of watershed depth and I find it hard to believe that this is a reliable value. These are colossal volumes. Dropping as many as 35 COMIDs for this issue is not a large quantity.

Ninth, for purposes of experimental covariate prediction, as part of my scripting, I am removing the COMIDs with giant storages as it is hard to envision conceptually that they are correct.

spCOV <- spCOV[spCOV$flood_storage < 0.52,]

screen shot 2018-03-19 at 11 18 14 am EMPIRICAL DISTRIBUTION OF FLOOD_STORAGE AS SEEN INSIDE THE all_gage_data.feather (red) AND all_huc12_covariates.feather (black) FILES. See what might appear as a population change.