Closed jeff1evesque closed 6 years ago
We manually ran the basic.R
without error:
> ##
> ## basic.R, analyze the the following wikipedia dataset:
> ##
> ## - https://www.dropbox.com/s/x14f3bg8flej1n7/train_1.csv?dl=1
> ## - https://www.dropbox.com/s/o2df10dnyt3bg02/train_2.csv?dl=1
> ##
>
> ## set project cwd
> cwd <- dirname(dirname(rstudioapi::getSourceEditorContext()$path))
> setwd(cwd)
>
> ## utility functions
> devtools::install_local(paste(cwd, sep='', '/packages/ist687utility'))
Skipping install of 'ist687utility' from a local remote, the SHA1 (0.1.0) has not changed since last install.
Use `force = TRUE` to force installation
> library('ist687utility')
>
> ## load packages
> load_package('reshape2')
>
> ## local variables
> reshaped_page <- c('name', 'project', 'access', 'agent')
>
> ## dataset directory
> dir.create(file.path(cwd, 'dataset'), showWarnings = FALSE)
>
> ## download datasets
> download_source(
+ 'https://www.dropbox.com/s/x14f3bg8flej1n7/train_1.csv?dl=1',
+ './dataset/train_1.csv'
+ )
> download_source(
+ 'https://www.dropbox.com/s/o2df10dnyt3bg02/train_2.csv?dl=1',
+ './dataset/train_2.csv'
+ )
>
> ## create dataframes
> df1 <- load_df('./dataset/train_1.csv')
> df2 <- load_df('./dataset/train_2.csv')
>
> ## explode column: first column (i.e. 'Page') will become four columns
> df1 <- cbind(
+ colsplit(df1$Page, '_', reshaped_page),
+ df1[,-which(names(df1) == "Page")]
+ )
> df2 <- cbind(
+ colsplit(df2$Page, '_', reshaped_page),
+ df2[,-which(names(df2) == "Page")]
+ )
Additionally, it seems our dataframe is the correct structure:
Resolves #10.