tnagler / VineCopula

Statistical inference of vine copulas
87 stars 33 forks source link

BiCopSelect selects the wrong copula #58

Closed ProfFireDragon closed 5 years ago

ProfFireDragon commented 5 years ago

I have a set of u and v (copula data). Clayton is supposed to be the best option based on AIC. But the following two codes (both including Gumbel and Clayton as candidates) give totally different results - the Gumbel one is wrong.

It happens again when I use RVineStructureSelect. It is a big issue, as the truly optimal choice is not picked up at times, in an unknown way.

fit<-BiCopSelect(u,v,familyset=c(1,2,3,4,5,6),selectioncrit="AIC",rotations=FALSE) summary(fit) Family


No: 4 Name: Gumbel

Parameter(s)

par: 1.07

Dependence measures

Kendall's tau: 0.07 (empirical = 0.09, p value = 0.17) Upper TD: 0.09 Lower TD: 0

Fit statistics

logLik: 1.01 AIC: -0.02 BIC: 2.63

fit<-BiCopSelect(u,v,familyset=c(3,4),selectioncrit="AIC",rotations=FALSE) summary(fit) Family


No: 3 Name: Clayton

Parameter(s)

par: 0.25

Dependence measures

Kendall's tau: 0.11 (empirical = 0.09, p value = 0.17) Upper TD: 0 Lower TD: 0.06

Fit statistics

logLik: 2.39 AIC: -2.78 BIC: -0.13

Book1.txt

amitmittal-9294 commented 5 years ago

Check the basic differences in gumbel and Clayton parameters, each will catch a different structure. It is a good answer if only one of them fits the data. Also use the plots to visually confirm the fit


sent from a mobile device.



From: ProfFireDragon notifications@github.com Sent: Sunday, June 2, 2019 2:18:32 PM To: tnagler/VineCopula Cc: Subscribed Subject: [tnagler/VineCopula] BiCopSelect selects the wrong copula (#58)

I have a set of u and v (copula data). Clayton is supposed to be the best option based on AIC. But the following two codes (both including Gumbel and Clayton as candidates) give totally different results - the Gumbel one is wrong.

It happens again when I use RVineStructureSelect. It is a big issue, as the truly optimal choice is not picked up at times, in an unknown way.

fit<-BiCopSelect(u,v,familyset=c(1,2,3,4,5,6),selectioncrit="AIC",rotations=FALSE) summary(fit) Family


No: 4 Name: Gumbel

Parameter(s)

par: 1.07

Dependence measures

Kendall's tau: 0.07 (empirical = 0.09, p value = 0.17) Upper TD: 0.09 Lower TD: 0

Fit statistics

logLik: 1.01 AIC: -0.02 BIC: 2.63

fit<-BiCopSelect(u,v,familyset=c(3,4),selectioncrit="AIC",rotations=FALSE) summary(fit) Family


No: 3 Name: Clayton

Parameter(s)

par: 0.25

Dependence measures

Kendall's tau: 0.11 (empirical = 0.09, p value = 0.17) Upper TD: 0 Lower TD: 0.06

Fit statistics

logLik: 2.39 AIC: -2.78 BIC: -0.13

Book1.txthttps://github.com/tnagler/VineCopula/files/3244594/Book1.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/tnagler/VineCopula/issues/58?email_source=notifications&email_token=AKBS5T3QII45JFJARVDDEJDPYOCOBA5CNFSM4HSCF3RKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXEJIGA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKBS5T45LC6TA4GG24NATCTPYOCOBANCNFSM4HSCF3RA.

tnagler commented 5 years ago

Hi. The issue that you're seeing is related to the presel option in BiCopSelect(). If TRUE (default), it

excludes families before fitting based on symmetry properties of the data. Makes the selection about 30% faster (on average), but may yield slightly worse results in few special cases.

In you specific example, the correlation in the right upper quadrant is larger than in the lower left. This is a feature reflected by the Gumbel, but not by the Clayton copula. Hence, the Clayton copula is excluded before fitting any model, even though it would lead to a better AIC. (A likely reason for that is that your data's marginal distributions deviate quite substantially from the uniform.)

The Clayton copula was not excluded when restricting to familyset = c(3, 4) due to a bug when only one family fits the symmetry properties. I have fixed that in 63a036b3ef125ed243d4cda9aaed558fbf52a97c, now both versions select the Gumbel.

If you don't want that behavior, just set presel = FALSE. But I recommend to check your models for the marginal distributions first. If you ensure that margins are uniform (e.g., with u <- pobs(u); v <- pobs(v)), the selected copula will be Gaussian. Hence, the asymmetries in your data may be a result of bad marginal models.

ProfFireDragon commented 5 years ago

Thanks for your reply. So you are saying that although AIC is selected explicitly as the main criterion for choosing the copula, there's actually another "hidden" criterion from presel? The manual is very vague and unclear on this. Users would truly believe that AIC is the ONLY criterion, if that hidden criterion is not more explicitly and clearly explained somewhere.

Btw, are there any other hidden criteria that are also not stated explicitly / clearly?

Thanks much again.

tnagler commented 5 years ago

The manual is not vague, it explicitly states that presel "excludes families before fitting based on symmetry properties of the data." It is not a hidden criterion and there are no others.

ProfFireDragon commented 5 years ago

Thanks so much for your prompt reply, though I still think "symmetry properties of the data" can be elaborated a bit more in the manual.

I follow your advice but just realise that this extra filtering criterion can be switched off only in BiCopSelect, but not RVineStructureSelect, the latter of which is a core function of this package. It would be nice if the latter can also allow the switching-off.

Thank you again.

tvatter commented 5 years ago

See #59, which addresses the second issue (and more).