tnagler / VineCopula

Statistical inference of vine copulas
87 stars 33 forks source link

Calculating conditional probability using BiCopCDF #87

Closed LSRathore closed 1 year ago

LSRathore commented 1 year ago

I have dataframe X (with columns x1 and x2) and would like to calculate conditional probability, something like P(x1<0.5|x2<0.3), which can be calculated using BiCopCDF. My question is how to obtain the transformed values (i.e. u and v) from original data (x1 and x2)?

To elaborate it more using the example above, how can I get the u and v values corresponding to x1 = 0.5 and x2 = 0.3. One option is to find a distribution that fits x1 and x2 the best and then calculate CDF, however, I am not sure how good any specific distribution would fit the data. Any suggestion or help would be greatly appreciated.

u <- pobs(X)[,1]
v <- pobs(X)[,2]

par_clay = BiCopEst(u,v,3)$par
cop_clay = BiCop(family = 3, par = par_clay)

cond_prob = BiCopCDF(u = u1, v = v1, cop_clay)
tnagler commented 1 year ago

You can use

u1 = ecdf(X[, 1])(0.5)
v1 = ecdf(X[, 2])(0.3)
LSRathore commented 1 year ago

@tnagler Thanks, ecdf is what I need here. However, one more issue is- is it possible to get the probability of x1, when x2 is a value which is not present in the original data. The ecdf of x2 becomes zero in this case. For e.g., if the original x2 data has all the values > 1; this means ecdf of x2 = 0.3 would be zero, and this makes the conditional probability of P(x1<0.5|x2<0.3) undefined. One option here is to generate random numbers for x1 and x2 which have x2<0 values and their corresponding x1 values. Then I obtain the ecdf of x2 which would not be zero. Is there any way I can do it using VineCopula. Thanks

tnagler commented 1 year ago

Then you have to use the corrected version of the ecdf more commonly used in copulas models. For example:

EmpCDF <- function(x) {
    n <- length(x)
    Fn <- ecdf(x)
    function(xx) pmax(n * Fn(xx), 1) / (n + 1)
}

This makes sure the return values are always between 1/(n+1) and 1 - 1/(n+1).

LSRathore commented 1 year ago

This function returns a constant cdf value for all low x2 values which makes the denominator zero in my case. Let me explain the problem I'm working on, I need to calculate the conditional probability, for e.g., P(x1<0.5|0<x2<0.3). This can be simplified as below-

P(x1<0.5|0<x2<0.3) = P(x1<0.5 AND 0<x2<0.3)/P(0<x2<0.3) => (P(x1<0.5 AND x2<0.3) - P(x1<0.5 AND x2<0))/(P(x2<0.3) - P(x2<0)) => (Cop_CDF(x1 = 0.5, x2 = 0.3) - Cop_CDF(x1 = 0.5, x2 =0))/(ECDF(x2= 0.3) - ECDF(x2 = 0))

I can obtain the numerator terms using BiCopCDF. However, the problem I have with the denominator is that the occurrence of 0<x2<0.3 in original data is 0. That makes the overall conditional probability undefined. Even using the corrected ecdf function suggest by you, I get the denominator 0 as both the ecdf of x2, at 0.3 and at 0, are the same.

I think it can be resolved by generating more random samples of x1 and x2 using copula where the occurrence of 0<x2<0.3 is not zero. Do you think this would be right way to solve this problem? The random samples can be generated using VineCopula but that would give all samples ranging from 0 to 1. Is there any way to generate the random samples based on the original x1 and x2 distribution?

Thanks

tnagler commented 1 year ago

Well by definition, an empirical distribution function cannot give answers about things that never happened. Also simulating from the copula and then transforming to the "original x1 and x2 distribution" will not help, if you use the empirical distribution for that transformation.

If you want to extrapolate beyond the observed data, your best bet is to find a reasonable parametric model for the marginal distributions.

LSRathore commented 1 year ago

Finding reasonable parametric model for CDF makes perfect sense to me. However, I am not quite sure how good a given distribution can fit to the original data. Do you think calculating parametric CDF using some commonly used distributions and then selecting the best one by comparing them to ECDF using MSE would be a good approach?

tnagler commented 1 year ago

Sounds reasonable. You can have a look at univariateML for fitting different families.

LSRathore commented 1 year ago

Thanks for your help.