rstudio / pointblank

Data quality assessment and metadata reporting for data frames and database tables
https://rstudio.github.io/pointblank/
Other
844 stars 51 forks source link

clinically-oriented codeboook/data dictionary #328

Open higgi13425 opened 2 years ago

higgi13425 commented 2 years ago

Proposal

Hi Rich - Just a particular viewpoint - it can be really helpful to be able to generate a codebook for clinical data in a semi-automated fashion, a lot like a REDCap codebook redcap This is a bit of a wish list, but here goes:

  1. make a tibble that can be exported to things like excel a. option to make pretty pdf, HTML, Word Rmd if desired.
  2. include columns for a. variable_name b. variable_pretty (still short, but with spaces, title case for tables) - may require user to enter if no labels c. variable description (long & detailed) - likely to require user to enter d. for numeric - mean, median, range e. for factors - each level f. for all vars - % missing
  3. somehow handle labels coming in from SAS, SPSS, Stata a. one suggestion - link values to value labels, i.e. values 0-3, with labels "none", "mild", "moderate", "severe", as 0_none, 1_mild, 2_moderate, 3_severe b. pull in variable labels - possibly as variable_pretty

Can have more detail - see data dictionary from REDCap Details on an example of a standard REDCap data dictionary can be found here : https://www.utsouthwestern.edu/edumedia/edufiles/about_us/admin_offices/academic_information_services/redcap-database-creating-dictionary.pdf

I hope that this is helpful. Could be a function like make_codebook() or make_data_dictionary()

rich-iannone commented 2 years ago

Peter, this is very helpful, thank you for providing these detailed requirements! I may need your help in the near future with obtaining SAS-, SPSS-, Stata-based inputs.

higgi13425 commented 2 years ago

Sure - anytime via email or twitter at @ibddoctor On Friday, July 2, 2021, 11:06:48 AM PDT, Richard Iannone @.***> wrote:

Peter, this is very helpful, thank you for providing these detailed requirements! I may need your help in the near future with obtaining SAS-, SPSS-, Stata-based inputs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

higgi13425 commented 2 years ago

lots of challenges with labels. I tend to link them ( 0_failed, 1_success) to make sure they don't get confused or lost/flipped, as happened in the JAMA asthma intervention trial (swapped 0 for 1 for their response variable  - got it backward - interpreted the intervention as helpful when it was actually harmful - had to withdraw the paper a year later when this was discovered by someone doing a secondary analysis) Peter On Friday, July 2, 2021, 03:13:16 PM PDT, Peter Higgins @.***> wrote:

Sure - anytime via email or twitter at @ibddoctor On Friday, July 2, 2021, 11:06:48 AM PDT, Richard Iannone @.***> wrote:

Peter, this is very helpful, thank you for providing these detailed requirements! I may need your help in the near future with obtaining SAS-, SPSS-, Stata-based inputs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

higgi13425 commented 2 years ago

Once they are linked, you can recover the numeric version with parse_number, and you can recover the text label with str_sub(3,-1) - though it would be nice to wrap this in a nicer function name Peter On Friday, July 2, 2021, 03:15:32 PM PDT, Peter Higgins @.***> wrote:

lots of challenges with labels. I tend to link them ( 0_failed, 1_success) to make sure they don't get confused or lost/flipped, as happened in the JAMA asthma intervention trial (swapped 0 for 1 for their response variable  - got it backward - interpreted the intervention as helpful when it was actually harmful - had to withdraw the paper a year later when this was discovered by someone doing a secondary analysis) Peter On Friday, July 2, 2021, 03:13:16 PM PDT, Peter Higgins @.***> wrote:

Sure - anytime via email or twitter at @ibddoctor On Friday, July 2, 2021, 11:06:48 AM PDT, Richard Iannone @.***> wrote:

Peter, this is very helpful, thank you for providing these detailed requirements! I may need your help in the near future with obtaining SAS-, SPSS-, Stata-based inputs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

higgi13425 commented 2 years ago

in theory, 3 helper functions1. link_label() - links stored number to label, as in (0_no, 1_yes), and also makes variable into a factor, with levels ordered by the number.2. parse_number() to pull from the linked label (1_female, 2_male), the original integer, formatted as integer to allow doing math3. pretty_label() to pull from the linked label (0_none, 1_mild, 2_moderate, 3_severe), the original label and put into title case, as a factor with levels ordered by the original number. For making pretty tables and graph axis labels. On Friday, July 2, 2021, 03:17:05 PM PDT, Peter Higgins @.***> wrote:

Once they are linked, you can recover the numeric version with parse_number, and you can recover the text label with str_sub(3,-1) - though it would be nice to wrap this in a nicer function name Peter On Friday, July 2, 2021, 03:15:32 PM PDT, Peter Higgins @.***> wrote:

lots of challenges with labels. I tend to link them ( 0_failed, 1_success) to make sure they don't get confused or lost/flipped, as happened in the JAMA asthma intervention trial (swapped 0 for 1 for their response variable  - got it backward - interpreted the intervention as helpful when it was actually harmful - had to withdraw the paper a year later when this was discovered by someone doing a secondary analysis) Peter On Friday, July 2, 2021, 03:13:16 PM PDT, Peter Higgins @.***> wrote:

Sure - anytime via email or twitter at @ibddoctor On Friday, July 2, 2021, 11:06:48 AM PDT, Richard Iannone @.***> wrote:

Peter, this is very helpful, thank you for providing these detailed requirements! I may need your help in the near future with obtaining SAS-, SPSS-, Stata-based inputs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.