insightsengineering / random.cdisc.data

Create random CDISC data
https://insightsengineering.github.io/random.cdisc.data/
Other
32 stars 6 forks source link

Relatively wrong dates of death vs alive or relation between CR vs EFS #18

Closed cicdguy closed 2 years ago

cicdguy commented 3 years ago

Original message

First of all I want to state that this issue should not results in any previous wrong analysis results. However I feel that this is important to know about relatively incorrect dates. There could be observed a relatively wrong dates of death vs alive or relation between CR vs EFS in our random.cdisc.data datasets. The best description of the issues will be this two print screens:

user/2704/files/b454e500-a1b0-11ea-9762-ae6d29c425e3) user/2704/files/b74fd580-a1b0-11ea-88f3-89d879348ff4) Provenance: ``` Creator: Polkas ``` See below https://github.com/insightsengineering/random.cdisc.data/issues/18#issuecomment-1106467364 for example # TODO - [ ] in `radsl`, when mutate `DTHDT`, such that if `DTHDT` < `LSTALVDT`, `DTHDT` = `LSTALVDT` + sample(365,1) - [ ] regenerate `cadsl` and `cadtte` - [ ] add unittest case and check `DTHDT` > = `LSTALVDT` #
shajoezhu commented 2 years ago

hi @Polkas , I was wondering by any chance that you still have copies of these two screen shots? Thanks!

Polkas commented 2 years ago

@shajoezhu I think I find sth connected with it. I checked the current state for "death vs alive" and looks like it is already solved. It will be great if sb from SME will confirm that.

Zombie_ADSL

"PFS vs EFS"

EFS_OR_PFS

shajoezhu commented 2 years ago

Awesome! Thanks so much @Polkas ! We will fix it

malexthorpe commented 2 years ago

I've regenerated the data and the DTHDT matches the LSTALVDT now:

image
shajoezhu commented 2 years ago

Hey @malexthorpe , I was wondering did you raise a PR for this? could you reference it here please. Thanks

malexthorpe commented 2 years ago

Hey @shajoezhu no PR was required as the issue wasn't present in the data anymore.

juliadedic1 commented 2 years ago

Hmm that's interesting @malexthorpe. I saw cases were the LSTALVDT variable had a date post the DTHDT date and I was trying to fix the code before I released this issue. Maybe it was something on my side. Were you able to add the unittest to check this works for all instances?

shajoezhu commented 2 years ago

Hi @malexthorpe , thanks for looking into this. I have checked with the following

 a <- radsl(cached = T) %>% filter(!is.na(DTHDT))
> which(a$DTHDT < a$LSTALVDT)
integer(0)

Can you add a unittest case for the cached data and check that DTHDT > = LSTALVDT, then we can close this issue. Thanks

malexthorpe commented 2 years ago

Hey thanks @juliadedic1 and @shajoezhu I have added a unit test in #145 which should error if there are any exceptions