pharmaverse / admiralpeds

Admiral Package Extension for Pediatric Clinical Trials
https://pharmaverse.github.io/admiralpeds/
Apache License 2.0
13 stars 3 forks source link

closes #17 creation of data-raw folder to handle metadata creation #21

Closed Pierre-Wallet closed 7 months ago

Pierre-Wallet commented 8 months ago

17 creation of metadata-creation.R to handle the building of metadatasets, their storage in data folder and their documentation skeleton in R/data.R. #17 creation of 10 metadatasets to reflect WHO website, with their documentation manually updated. #17 creation of snapshot tests for each metadatsets created

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the main branch until you have checked off each task.

github-actions[bot] commented 8 months ago

Code Coverage

Package Line Rate Health
admiralpeds 100%
Summary 100% (5 / 5)
zdz2101 commented 8 months ago

from #15 https://github.com/pharmaverse/admiralpeds/pull/16#discussion_r1517893616 we will want to adopt something of this nature

Pierre-Wallet commented 8 months ago

@zdz2101 , the update is done. Everything seems clear to me.

manciniedoardo commented 8 months ago

FYI that the data_raw folder is also being created in this PR

zdz2101 commented 8 months ago

@Pierre-Wallet can you take a look at: https://github.com/pharmaverse/admiralpeds/pull/16

What we'll do is instead of exporting each dataset, we write them into the sysdata.rda file to store them internally, as for what needs to be done here, can you:

  1. Modify the data.R file and turn it into something like get_who_data.R that creates a function called get_who_data() that essentially is a wrapped switch statement that fetch the respective dataset that the user may need
  2. remove each dataset .rda object
  3. we probably will have to add a separate script in data-raw such that it runs both the cdc_metadata_creation.R and who_metadata_creation.R and at the bottom has one line that saves all the objects to sysdata, it'll look something like this:
    source("cdc_metadata_creation.R")
    source("who_metadata_creation.R")
    usethis::use_data(cdc_wtage, cdc_htage, cdc_bmiage, {all your who_dataobjects}, overwrite = TRUE, internal = TRUE)
  4. modify testthat files to call get_who_data() instead of calling the object directly
rossfarrugia commented 7 months ago

No problem @zdz2101 - sounds a good plan, and thanks for @Pierre-Wallet here for thorough review and input to guide us to the ideal strategy. For now please review each others PRs and merge them once ready and when I'm back I'll put a reminder to do some additional checks.

Pierre-Wallet commented 7 months ago

In terms of house-keeping though, can you separate the who_metadata_creation.R into its individual 10 files in the data-raw folder? Each individual dataset can have a corresponding file-to-rda object

If I understood correctly, you want 10 .R files, each of them building an .rda object instead of having the building of the ten datasets in one .R file as it currently is? To be more precise, do you want to :

  1. keep who_metadata_creation which sources 10 sub programs
  2. or do you want to have directly 10 programs in data-raw/?

One question is: do we give up the idea of having all the .rda in one internal .rda?

zdz2101 commented 7 months ago

I like 1 the most but actually on second thought, that brings us back to the same problem with how to document all the objects since they won't exported to the namespace, can you just follow the structure pharmaversesdtm has it? So 1 dedicated program for each object and each dataset have its own .rda, so option 2 and abandoning the internal sysdata.rda object

Pierre-Wallet commented 7 months ago

I will test the options tomorrow. I think it is possible to get all datasets in one internal .rda and still have a documentation for each dataset. I keep you posted

Pierre-Wallet commented 7 months ago

Hi @zdz2101 , I just pushed my updates. Please have a look to it, I eventually created one pgm for each .rda with its own documentation. The only issue I have is a lintr one, which does not accept the usage of source().