Closed notEvil closed 4 years ago
There is not a very good reason why this is not implemented. There are some:
gamCopula
package offers functionality for varying parameters, although I'm not sure if it fits your needs.RVineMatrix
object? If so, what should plot.RVineMatrix()
, contour.RvineMatrix()
, RVineCopSelect()
etc do with it? RVineMatrix
objects for RVine[PDF/CDF/PIT/Sim/Par2Tau/Par2Beta/LogLik/Hessian/grad]()
and possibly some more functions that I forgot. We should then convert them into 3d-arrays internally before calling the C-functions, which need to modified as well of course. The point is that, if we have that feature for one function, we should have it for all functions, where it make sense. Unfortunately, that's a lot of work and I'm not sure it's worth it.I mostly agree with this point of view. What if only par and par2 are vectorized where it makes sense. For instance, add par and par2 as optional arguments to RVinePDF
and RVineSim
(like in dnorm
and rnorm
). Of course, the C-code needs adjustment, but the gains here outweigh the efforts imo. In case of RVineSim
the C-code may even be replaced by R-code because the outer loop is replaced by vectorized calls to Hfunc and Hinv.
If we vectorize over parameters, we also need to vectorize over the family (for e.g., a Clayton with Kendall's tau going from negative to positive).
I don't like additional arguments at all. We already specify the model in the RVM
argument, adding more arguments containing overlapping information is confusing. (Also dnorm
has only one mean
and sd
argument, no duplicated arguments for the case where you want to vectorize.)
C-code should not be replaced by R-code. This would make things slower for 99% of the users. Modifying the C-code should be fairly straightforward anyway. The bulk of the work would be R-infrastructure: checking inputs, converting to arrays, handling errors/edge cases etc.
While doing what you want just for RVineSim would be fairly straightforward, doing it for all functions in the library is a huge task. From my experience, it should take at least 3-4 days of full-time work (vectorizing the BiCop family took me much longer).
I like your idea and would be happy to accept/help with a PR, but I will most probably not do it myself.
* If we vectorize over parameters, we also need to vectorize over the family (for e.g., a Clayton with Kendall's tau going from negative to positive).
I disagree, family defines the distribution while par and par2 parameterize it. Maybe in mixture models this would be nice, but in the usual context the distribution is fixed.
* I don't like additional arguments at all. We already specify the model in the `RVM` argument, adding more arguments containing overlapping information is confusing. (Also `dnorm` has only one `mean` and `sd` argument, no duplicated arguments for the case where you want to vectorize.)
I get it. Yet more confusing would be an RVineMatrix
with 3d par and par2 in the general context (just like a univariate normal with multiple means). So the right approach would be to separate the distribution definition from the parameters (e.g. extract par and par2 from RVineMatrix
). This would be a fairly easy (add parameters to functions) but backwards incompatible change.
* C-code should not be replaced by R-code. This would make things slower for 99% of the users. Modifying the C-code should be fairly straightforward anyway. The bulk of the work would be R-infrastructure: checking inputs, converting to arrays, handling errors/edge cases etc.
R-code is easier to maintain, so if things can be done in R, they should be done in R. In this case, the R-version wouldn't be much slower than the C-version because there is no big loop.
If there is a viable solution, I'd draft a PR.
I disagree, family defines the distribution while par and par2 parameterize it. Maybe in mixture models this would be nice, but in the usual context the distribution is fixed.
In any sensible copula model with varying parameters, the range of dependence should include both positive and negative Kendall's taus. That is true for Gaussian/Student/Frank. For all others, we would have an unnecessary restriction to either only positive or only negative dependence.
An issue is the somewhat arbitrary definition of "families" in this library.
We could just as well define "Clayton" as family = 3
for par > 0
and family = 23
for par < 0
. Similarly, we could patch together 3/33
or 13/23
etc. (This is what is done in gamCopula, yielding four versions of "Clayton" that all allow for par
in [-infty, infty]). Defining families that way would be weird for fixed-parameter models (two out of four models are identical for fixed par). But in the varying parameter case this is absolutely what we want.
To mimic that behavior, we need to vectorize over the family (that's why the BiCop
functions are vectorized over family). Otherwise the whole feature is quite useless for families other than Gaussian/Student/Frank.
Yet more confusing would be an RVineMatrix with 3d par and par2 in the general context (just like a univariate normal with multiple means). So the right approach would be to separate the distribution definition from the parameters (e.g. extract par and par2 from RVineMatrix). This would be a fairly easy (add parameters to functions) but backwards incompatible change.
Yeah that would break almost all existing code, so extracting parameters is not really an option. I don't really see any other way than RVineSim(100, RVM = list(RVM1, ...., RVM100))
.
R-code is easier to maintain, so if things can be done in R, they should be done in R. In this case, the R-version wouldn't be much slower than the C-version because there is no big loop.
Everything in this library can be done in R, but the C-parts exist for a good reason. I could also live with a solution that calls C for fixed-parameter models and uses R-code for varying-parameter models.
Thanks for explaining the reasons for vectorized family!
I'll draft a PR with focus on RVineSim
then.
Hi! I'm working with vine copulas where the parameters are not constant. So, for instance, I would like to get a random sample where each sample corresponds to different parameters (like
rnorm(10, mean=1:10)
), or the likelihood where each observation corresponds to different parameters (likednorm(1:10, mean=1:10)
). As a proof of concept I took the code of RVineSim and rewrote the C part in R (straight forward without optimization). Then I adjusted the code to use 3d arrays in family, par and par2 and it worked. Are there any good reasons why this is not implemented?