scworland / restore-2018

scripts for predicting streamflow characteristics in ungaged basins for RESTORE
4 stars 2 forks source link

Canonical Method for FDC to Probability Computation #30

Open ghost opened 6 years ago

ghost commented 6 years ago

Given a decade of flows in variable x, discharges of interest, which could be x itself, and an offset to deal with zeros, and a generalization that an FDC is log-normal like in shape, below is my line of thinking on linear interpolation and extrapolation to the probabilities.


fdc2p_core <- function(x, q, offset=0.01) {
   n <- length(x)
   pnorm(approx(log10(sort(x)+offset), qnorm((1:n)/(n+1)),
           xout=log10(     q +offset), rule=2)$y)
}

I would be greatly interested in learning advice of the team about how the problem is to be framed. Lastly, the algorithm for Probability back to Flow is a little more delicate on the right-tail because of real extrapolation because flow is not bound to the open set (0,1) as probability is.

scworland commented 6 years ago

I have been using a loess model,

fit <- loess(q~f,data=est_fdc,span=0.2)

Q_est <- data.frame(date=donor_ep$date,
                    Q_est=round(predict(fit,donor_ep$ep),0)) 

Do you see a problem with that approach?

ghost commented 6 years ago

It is ambiguous on the handling of zeros and not relying on an interpolation scheme siding towards generalized linearity in log-qnorm space. The question of span could be resolved and I suspect the 0.2 is reasonable enough. loess itself does not guarantee monotonicity though sorting of the data does.

I was uncertain as to your thinking. I have often seen thinking not break towards log and I want to understand why. I also notice that your data is est_fdc but my thinking was along the lines of the Qp part where there is an obs_fdc. As in: Q[donor]-->P[obs via obs DV donor]==P[ungage]--->Q[ungaged via est_fdc].

Perhaps you are thinking then that I am inquiring on the "pQ" part but I wanted to start this important thread with "Qp."

Consider if we are feeding and FDC of nearly a decade long anyway and then turning around and estimating the probability for each of those points, then one solution is as simple as lmomco::pp(x, sort=FALSE) but if a donor gage's decade with in the x is defined as having some 60 missing days and we estimate those missing DVs by interpolation then inversion would require FDC having less than 3653 days so an algorithm is needed.