insightsengineering / random.cdisc.data

Create random CDISC data
https://insightsengineering.github.io/random.cdisc.data/
Other
30 stars 5 forks source link

random cdisc data very slow for larger data #21

Open cicdguy opened 3 years ago

cicdguy commented 3 years ago

Original message

Running the following code takes a long time! This is on r.roche.com, r 3.6.3

NEST/nest_on_bee/master/bee_nest_utils.R") bee_use_nest(release = "2021_05_05") ADSL <- radsl(N = 1002) ADLB <- radlb(ADSL) I reduced this from 15000 as it took way too long. Using system.time I get the following results: user system elapsed 37.852 0.584 38.436 This is extremely long to make a dataset with 21,000 records! I know random.cdisc really only exists for dummy data, but this seems like extremely poor performance Provenance: ``` Creator: martik32 ``` # TODO Improve performance. A few suggestion 1. use mclapply 2. datatable if necessary
nikolas-burkoff commented 2 years ago

See: internal_github_url/NEST/random.cdisc.data/issues/242 - I suspect there are a lot of places which could be improved

In the past users used rcd directly calling radsl etc. - now we don't release rcd to users (and only use it to create a snapshot to be saved in scda) so I guess there's less value in optimizing this than there was at the time the issue was created

gogonzo commented 2 years ago

@shajoezhu does it matter for you guys? We don't use rcd at all, I'd close it it it was for us. NEST users should switch to scda instead.

shajoezhu commented 2 years ago

Thanks @gogonzo , we will put this back into the backlog, I agree we are using scda data most of time for our NEST package development, I remember discussion that teams were using these functions to create large fake data for stress testing tasks. let's keep this open please. Thanks