spsanderson / healthyR.data

Data sets for the healthyR package.
https://www.spsanderson.com/healthyR.data/
Other
9 stars 3 forks source link
data data-science data-sets healthcare healthcare-analysis healthcare-application healthcare-datasets r rstats

healthyR.data

CRAN_Status_Badge Lifecycle:
stable PRs
Welcome

The goal of the { healthyR.data } package is to provide a simple yet feature rich administrative data-set allowing for the testing of functions inside of the { healthyR } package. It can be used to test its functions or any function you create.

Installation

You can install the released version of healthyR.data from CRAN with:

install.packages("healthyR.data")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("spsanderson/healthyR.data")

Example

This is a basic example which shows you how to solve a common problem:

library(healthyR.data)
library(dplyr)

df <- healthyR_data

glimpse(df)
#> Rows: 187,721
#> Columns: 17
#> $ mrn                      <chr> "86069614", "60856527", "80673110", "55897373…
#> $ visit_id                 <chr> "3519249247", "3602225015", "3125290892", "38…
#> $ visit_start_date_time    <dttm> 2010-01-04 05:00:00, 2010-01-04 05:00:00, 20…
#> $ visit_end_date_time      <dttm> 2010-01-04, 2010-01-04, 2010-01-04, 2010-01-…
#> $ total_charge_amount      <dbl> 25983.88, 22774.05, 10690.45, 8788.02, 7325.1…
#> $ total_amount_due         <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 201.52, 20.00, …
#> $ total_adjustment_amount  <dbl> -20799.61, -12978.37, -7596.09, -7663.57, -60…
#> $ payer_grouping           <chr> "Medicare B", "Medicare HMO", "HMO", "Medicar…
#> $ total_payment_amount     <dbl> -5184.27, -9795.68, -3094.36, -1124.45, -1269…
#> $ ip_op_flag               <chr> "O", "O", "O", "O", "O", "O", "O", "O", "O", …
#> $ service_line             <chr> "General Outpatient", "General Outpatient", "…
#> $ length_of_stay           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ expected_length_of_stay  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ length_of_stay_threshold <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ los_outlier_flag         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ readmit_flag             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ readmit_expectation      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

df %>% 
    count(ip_op_flag, service_line) %>%
    arrange(ip_op_flag, desc(n)) %>%
    rename(count = n)
#> # A tibble: 30 × 3
#>    ip_op_flag service_line                                 count
#>    <chr>      <chr>                                        <int>
#>  1 I          Medical                                      64435
#>  2 I          Surgical                                     14916
#>  3 I          COPD                                          4398
#>  4 I          CHF                                           3871
#>  5 I          Pneumonia                                     3323
#>  6 I          Cellulitis                                    3311
#>  7 I          Major Depression/Bipolar Affective Disorders  2866
#>  8 I          Chest Pain                                    2766
#>  9 I          GI Hemorrhage                                 2404
#> 10 I          MI                                            2253
#> # ℹ 20 more rows