pharmaverse / sdtm.oak

An EDC and Data Standard agnostic SDTM data transformation engine that automates the transformation of raw clinical data in ODM format to SDTM based on standard mapping algorithms
https://pharmaverse.github.io/sdtm.oak/
Apache License 2.0
26 stars 9 forks source link

Bug: create_iso8601 #101

Open rammprasad opened 1 day ago

rammprasad commented 1 day ago

What happened?

date_val <- c(NA, "15-Sep-2022", "17-Feb-21", "4-Oct-20", "20-Jan-20", "UN-UNK-1995", NA, "UN-UNK-21",
       "26-Jan-20", "28 Jan 2021", "12-Feb-20", "10-UNK-20", NA, NA)

create_iso8601(date_val, .format = c("d-m-y"), .na = c("UN", "UNK"))

problems()

There are two issues here:

  1. This example does not process the value 28 JAN 2021. When I change the function call to create_iso8601(date_val, .format = c("d-m-y", "dd MMM yyyy"), .na = c("UN", "UNK")), I receive an error: ! Number of vectors in ... should match length of .format.

  2. Missing values are reported as problems.

Expected Behavior:

  1. I should be able to provide more than one format for a vector. As long as the date matches one of the formats, the function should process it.

  2. Missing values should not be reported as problems.

cc - @ramiromagno

Session Information

No response

Reproducible Example

No response

ramiromagno commented 17 hours ago

Hi Ramm,

The missing values not being regarded as problems needs fixing... I believe that in one of our meetings it was decided this way... So it was a feature. :)

Regarding "28 JAN 2021", look carefully for the difference between my code and yours... And brush up again: https://pharmaverse.github.io/sdtm.oak/articles/iso_8601.html#multiple-alternative-formats.

library(sdtm.oak)
date_val <- c(
  NA,
  "15-Sep-2022",
  "17-Feb-21",
  "4-Oct-20",
  "20-Jan-20",
  "UN-UNK-1995",
  NA,
  "UN-UNK-21",
  "26-Jan-20",
  "28 Jan 2021",
  "12-Feb-20",
  "10-UNK-20",
  NA,
  NA
)

create_iso8601(date_val,
               .format = list(c("d-m-y", "d m y")),
               .na = c("UN", "UNK"))
#>  [1] NA           "2022-09-15" "2021-02-17" "2020-10-04" "2020-01-20"
#>  [6] "1995"       NA           "2021"       "2020-01-26" "2021-01-28"
#> [11] "2020-02-12" "2020---10"  NA           NA
problems()

Created on 2024-10-25 with reprex v2.1.1

ramiromagno commented 14 hours ago

create_iso8601() accepts multiple vectors as .... So, what should happen when the element of, say, a first vector (e.g. date) is NA, but the second vector (e.g. time) is non-missing but fails the parsing... should that be reported as a problem?