jeff1evesque / ist-687

Syracuse IST687 final project with Jesse Warren (team member)
2 stars 0 forks source link

10: Create 'analysis/basic.R' to split first column #13

Closed jeff1evesque closed 6 years ago

jeff1evesque commented 6 years ago

Resolves #10.

jeff1evesque commented 6 years ago

We manually ran the basic.R without error:

> ##
> ## basic.R, analyze the the following wikipedia dataset:
> ##
> ##     - https://www.dropbox.com/s/x14f3bg8flej1n7/train_1.csv?dl=1
> ##     - https://www.dropbox.com/s/o2df10dnyt3bg02/train_2.csv?dl=1
> ##
> 
> ## set project cwd
> cwd <- dirname(dirname(rstudioapi::getSourceEditorContext()$path))
> setwd(cwd)
> 
> ## utility functions
> devtools::install_local(paste(cwd, sep='', '/packages/ist687utility'))
Skipping install of 'ist687utility' from a local remote, the SHA1 (0.1.0) has not changed since last install.
  Use `force = TRUE` to force installation
> library('ist687utility')
> 
> ## load packages
> load_package('reshape2')
> 
> ## local variables
> reshaped_page <- c('name', 'project', 'access', 'agent')
> 
> ## dataset directory
> dir.create(file.path(cwd, 'dataset'), showWarnings = FALSE)
> 
> ## download datasets
> download_source(
+     'https://www.dropbox.com/s/x14f3bg8flej1n7/train_1.csv?dl=1',
+     './dataset/train_1.csv'
+ )
> download_source(
+     'https://www.dropbox.com/s/o2df10dnyt3bg02/train_2.csv?dl=1',
+     './dataset/train_2.csv'
+ )
> 
> ## create dataframes
> df1 <- load_df('./dataset/train_1.csv')
> df2 <- load_df('./dataset/train_2.csv')
> 
> ## explode column: first column (i.e. 'Page') will become four columns
> df1 <- cbind(
+     colsplit(df1$Page, '_', reshaped_page),
+     df1[,-which(names(df1) == "Page")]
+ )
> df2 <- cbind(
+   colsplit(df2$Page, '_', reshaped_page),
+   df2[,-which(names(df2) == "Page")]
+ )
jeff1evesque commented 6 years ago

Additionally, it seems our dataframe is the correct structure:

dataframe