matherealize / simdata

An R package for simulating data
https://matherealize.github.io/simdata/
7 stars 1 forks source link

Target correlation #4

Closed georgheinze closed 1 year ago

georgheinze commented 3 years ago

At a discussion about the package with colleagues, the question came up whether one could specify the target correlations between the finally generated variables instead of the correlation of the underlying multivariate normal data. This could perhaps be achieved by iteration?

AngelikaGeroldinger commented 3 years ago

I think, one problem would be that not all combinations of correlation structures and marginal distributions are possible for arbiratry variables (Leonov, S., Qaqish, B. Correlated endpoints: simulation, modeling, and extreme correlations. Stat Papers 61, 741–766 (2020). https://doi.org/10.1007/s00362-017-0960-2). Thus, the target correlation entered by the user would first have to be checked for feasibility.

matherealize commented 3 years ago

Thanks for your inputs. After studying the literature a bit, a short summary:

It seems that, given a specific set of desired marginal distributions (e.g. normal / uniform / binary):

A popular method to achieve sampling given desired marginals and a (feasible) target correlation seems to be the NORTA procedure (Cario and Nelson, Modeling and Generating Random Vectors with Arbitrary Marginal Distributions and Correlation Matrix), which fits very well to the workflow of the package (draw from initial normal distribution and transform to achieve desired marginals and target correlation). However, the downside is, in dimensions > 2, NORTA cannot match all feasible target correlations. Procedures that address this exist to at least get close to the desired target correlation.

I think a way forward would be to bring NORTA or any procedure derived from it into the package and see how well it works, in order to give usage recommendations. This seems reasonably simple, and can be compared with another implementation in https://cran.r-project.org/web/packages/SimCorMultRes

matherealize commented 1 year ago

I will close this for now, as we have implemented the NORTA approach and are comparing it right now to other packages.