trinker / wakefield

Generate random data sets
256 stars 28 forks source link

Generate data with correlations #3

Open jknowles opened 9 years ago

jknowles commented 9 years ago

I started work awhile ago on a much less ambitious project than wakefield to attempt to generate random data sets on the fly with a known correlation structure. You can see the seeds of that work here: https://github.com/jknowles/datasynthR

It would be cool to include the ability to generate numeric or factor data with a known correlation structure to build structural relationships into the very realistic looking data generated by wakefield.

trinker commented 9 years ago

It seems that you've done a lot of work on this already. This is pretty nice. After looking at what you have, replicating what you have is needlessly redundant.

Is there a way you could continue to develop datasynthR with the end goal ability to incorporate functionality into wakefield or as a stand alone package. Do you plan to make this a CRAN package? I'd like to see a relationship between the two packages in the way magrittr and dplyr have.

trinker commented 9 years ago

Note to future self...

Depending on @jknowles response I may want to import (add to Depends:) and make a wrapper for his package. Maybe named r_distribution_cor that works similar to r_sereies.

jknowles commented 9 years ago

@trinker I'm interested in this. I have run into a few snags with datasynthR that caused me to delay working on it while I moved on to other problems. But, I could probably return to it this summer and get a CRAN worthy version released soon enough. I'd want to check in with you about how to make the packages complementary. wakefield really solves one of those problems that I was having with datasynthR that the data generated didn't feel real enough for users who cared about more than the structure (plotting, etc.).

trinker commented 9 years ago

@jknowles Any progress on datasynthR?

Black-Milk commented 7 years ago

@trinker Any news on this?

jknowles commented 7 years ago

I've been revisiting datasynthR recently with a project for a client (and also exploring how wakefield works internally in the process). I imagine datasynthR will need to be refactored soon. I can't guarantee any time to be devoted to that in the coming months -- it depends on whether current projects necessitate it.