jeff1evesque / fin-654

Syracuse FIN-654 Final Project
3 stars 1 forks source link

Standardize 'type' dataframe column #18

Closed jeff1evesque closed 5 years ago

jeff1evesque commented 5 years ago

Since we've combined two datasets into one dataframe, the type column is inconsistent. Specifically, each dataset, introduces their own vernacular. This will need to be standardized within the run.R.

jeff1evesque commented 5 years ago

The following logic from load_dataframe_fin654.R properly prints the associated lists:

  ## replacement lists: index lenghts must match between 'old' and 'new'
  old = c(
    'hacked',
    'oops!',
    'unkn',
    'poor security',
    'lost device',
    'port',
    'disc',
    'phys',
    'insd',
    'stat'
  )

  new = c(
    'hack',
    'accidental-disclosed',
    'unknown',
    'hack',
    'lost-or-stolen',
    'lost-or-stolen',
    'accidental-disclosed',
    'accidental-disclosed',
    'insider',
    'accidental-disclosed'
  )

  if (length(old) == length(new)) {
    for (i in 1:length(new)) {
      print(paste(old[i], new[i]))
    }
  }

However, the following does not return the adjusted dataframe:

  ## replacement lists: index lenghts must match between 'old' and 'new'
  old = c(
    'hacked',
    'oops!',
    'unkn',
    'poor security',
    'lost device',
    'port',
    'disc',
    'phys',
    'insd',
    'stat'
  )

  new = c(
    'hack',
    'accidental-disclosed',
    'unknown',
    'hack',
    'lost-or-stolen',
    'lost-or-stolen',
    'accidental-disclosed',
    'accidental-disclosed',
    'insider',
    'accidental-disclosed'
  )

  if (length(old) == length(new)) {
    for (i in 1:length(new)) {
      data1$replace_val('type', old[i], new[i])
      data2$replace_val('type', old[i], new[i])
    }
  }

  ## return dataset
  return(rbind(data1$get_df(), data2$get_df()))
jeff1evesque commented 5 years ago

After running our app.R manually, we confirm that the type column, renamed as breach, has the expected adjusted column names:

> if (nzchar(Sys.getenv('RSTUDIO_USER_IDENTITY'))) {
+   cwd = dirname(rstudioapi::getSourceEditorContext()$path)
+   setwd(cwd)
+ }
> 
> ## utility functions
> devtools::install_local(paste0(cwd, '/packages/customUtility'))
√  checking for file 'C:\Users\jeff1\AppData\Local\Temp\RtmpAzu3aP\file16207d514df9\customUtility/DESCRIPTION' ... OK
-  preparing 'customUtility':
√  checking DESCRIPTION meta-information ... OK
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'customUtility_0.1.0.tar.gz'

Installing package into ‘C:/Users/jeff1/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
* installing *source* package 'customUtility' ...
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  'customUtility' 
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (customUtility)
In R CMD INSTALL
> devtools::install_local(paste0(cwd, '/packages/fin654'))
√  checking for file 'C:\Users\jeff1\AppData\Local\Temp\RtmpAzu3aP\file162044a87894\fin654/DESCRIPTION' (609ms)
-  preparing 'fin654':
√  checking DESCRIPTION meta-information ... OK
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'fin654_0.1.0.tar.gz'

Installing package into ‘C:/Users/jeff1/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
* installing *source* package 'fin654' ...
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  'fin654' 
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (fin654)
In R CMD INSTALL
> library('customUtility')
> 
> ## load packages
> load_package(c('reticulate', 'shiny', 'fin654'))
Loading required package: reticulate
Loading required package: shiny
Loading required package: fin654
reticulate      shiny     fin654 
      TRUE       TRUE       TRUE 
> py_install('pandas')
Solving environment: ...working... done

# All requested packages already installed.

Installation complete.

> df = load_data_fin654(
+     paste0(cwd, '/data/data-breaches.csv'),
+     paste0(cwd, '/data/Privacy_Rights_Clearinghouse-Data-Breaches-Export.csv'),
+     paste0('python/dataframe.py')
+   )
> unique(df$breach)
[1] "hack"                 "accidental-disclosed" "lost-or-stolen"      
[4] "inside job"           "insider"              "unknown"             
[7] "card"                
>