kgoldfeld / simstudy

simstudy: Illuminating research methods through data generation
https://kgoldfeld.github.io/simstudy/
GNU General Public License v3.0
80 stars 7 forks source link

Generate multiple predictors and coefficients and outputs with a single function call #127

Open kgoldfeld opened 2 years ago

kgoldfeld commented 2 years ago

I would like to implement some version of a function a created in a blog post a while back.

This is how I started the post: I was contacted about the possibility of creating a simple function in simstudy to generate a large data set that could include possibly 10’s or 100’s of potential predictors and an outcome. In this function, only a subset of the variables would actually be predictors. The idea is to be able to easily generate data for exploring ridge regression, Lasso regression, or other “regularization” methods. Alternatively, this can be used to very quickly generate correlated data (with one line of code) without going through the definition process.

In the post, I created function genMultPred. I would like to implement something similar to this in simstudy.

assignUser commented 2 years ago

I have skimmed the blog post and it looks interesting, I'm guessing that they wanted to use this in an ML context?

My only issue with this is that it does not adhere to the usual API/workflow of simstudy, which is of course possible but we should think about how to handle these non-definition-table-functions so that we don't add a bunch of different function that all work differently and are hard to remember and maintain.

kgoldfeld commented 2 years ago

I agree - but that cat is already out of the bag, with functions like genOrdCat, genMarkov, and genSplines. I totally get your point, but this is something that could be quite useful to folks. Are you thinking it would be better in a different package, like simstudyExtra?

assignUser commented 2 years ago

I agree - but that cat is already out of the bag

:joy_cat: That is true,not sure how to improve that situation. I think an extra package is too much at this point, maybe we can homogenize the API of these functions in some way for simstudy 2.0? I'll think about it but I think it is a useful function!