pharmaverse / sdtm.oak

An EDC and Data Standard agnostic SDTM data transformation engine that automates the transformation of raw clinical data in ODM format to SDTM based on standard mapping algorithms
https://pharmaverse.github.io/sdtm.oak/
Apache License 2.0
25 stars 7 forks source link

Add test case for `baseline_timepoints` in `derive_blfl` function #95

Closed ShiyuC closed 1 day ago

ShiyuC commented 2 weeks ago

This PR touches https://github.com/pharmaverse/sdtm.oak/issues/59, bullet point 3.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

github-actions[bot] commented 2 weeks ago

Code Coverage

Package Line Rate Health
sdtm.oak 93%
Summary 93% (952 / 1019)
ShiyuC commented 1 week ago

@edgar-manukyan @ramiromagno May I request your review of this small MR? Thanks in advance

edgar-manukyan commented 3 days ago

Thanks team. Shall we add a new issue with Ramiro's nice findings/suggestions and refactor the derive_blfl in that issue?

ShiyuC commented 3 days ago

Hi @ShiyuC:

When it comes to this PR specifically, I think that the changes are working as expected.

I took a look at the function code itself and there are a few things I personally think should be improved (but this might be addressed on a separate issue or not... I leave it up to you, Edgar and Ramm to decide)

  • ISO 8601 formatted dates are being used as character and not as date-times on their own right. This means that date and time comparisons are being done lexicographically, which I find dangerous (even though I understand the measures taken at dtc_datepart() and dtc_timepart() to make sure the format is comparable).
  • Using dplyr verbs (dplyr::mutate(), dplyr::na_if(), tidyr::drop_na()) to make code more readable here:
    bad_orres_rows <- is.na(ds_mod[[domain_prefixed_names["orres"]]]) |
    trimws(ds_mod[[domain_prefixed_names["orres"]]]) %in% c("ND", "NOT DONE", "")
    ds_mod <- ds_mod[!bad_orres_rows, ]
  • I think there is a bug in line 355, var_tpt -> tpt
    con_col <- c(domain_prefixed_names[c("testcd", "dtc", "var_tpt")], "VISIT")
  • Not that I think it makes a difference in the resulting computations, but it would be more human-readable, I would change sdtm_in to ds_mod:
     con_col <- con_col[con_col %in% names(sdtm_in)]
  • Is the setting of <unspecified> something that is part of the SDTM standard?
  • Replace rbind() with dplyr's equivalent
  • Replace dplyr::arrange_at() with arrange() as the former is superseded.
  • Replace the pattern get(domain_prefixed_names["tpt"]) by idiomatic rlang alternatives, e.g. !!rlang::sym(...) or {{...}}.

We can probably add these comments to issue #59

ShiyuC commented 2 days ago

Ram's comment: add example for baseline_timepoints