Closed jeff1evesque closed 6 years ago
109119b: like our dataset/
directory, we do not need to version control anything within the visualization/
directory, since they are calculated at runtime, and could vary between one execution and the next.
We've manually tested the following:
df1temp <- df1[1,1:7]
df2temp <- df2[1,1:7]
## year range
df1temp_start_date <- as.Date(colnames(df1temp)[5], format='X%Y.%m.%d')
df1temp_end_date <- as.Date(colnames(df1temp)[length(colnames(df1temp))], format='X%Y.%m.%d')
df2temp_start_date <- as.Date(colnames(df2temp)[5], format='X%Y.%m.%d')
df2temp_end_date <- as.Date(colnames(df2temp)[length(colnames(df2temp))], format='X%Y.%m.%d')
## combine columns
while (df1temp_start_date <= df1temp_end_date) {
Reduce(
'+',
df1temp[,grep(paste0('X',format(df1temp_start_date,"%Y.%m")),names(df1temp))]
)
}
while (df2temp_start_date <= df2temp_end_date) {
Reduce(
'+',
df2temp[,grep(paste0('X',format(df2temp_start_date,"%Y.%m")),names(df2temp))]
)
}
But, the r console seems stuck for the last 5 minutes. The following are the df[1|2]temp
values:
> df1temp
Access Agent Article Language X2015.07.01 X2015.07.02 X2015.07.03
1 all-access spider 2NE1 zh 18 11 5
> df2temp
Access Agent Article Language X2015.07.01 X2015.07.02 X2015.07.03
1 all-access spider 2NE1 zh 18 11 5
We tried to simplify our while
loop to the following:
df1temp <- df1[1,1:7]
df2temp <- df2[1,1:7]
## year range
df1temp_start_date <- as.Date(colnames(df1temp)[5], format='X%Y.%m.%d')
df1temp_end_date <- as.Date(colnames(df1temp)[length(colnames(df1temp))], format='X%Y.%m.%d')
df2temp_start_date <- as.Date(colnames(df2temp)[5], format='X%Y.%m.%d')
df2temp_end_date <- as.Date(colnames(df2temp)[length(colnames(df2temp))], format='X%Y.%m.%d')
## local variables
start_date1 <- df1temp_start_date
start_date2 <- df2temp_start_date
## combine columns
while (start_date1 <= df1temp_end_date) {
paste('yes')
}
while (start_date2 <= df2temp_end_date) {
paste('yes')
}
However, after 10+ minutes, it seems our logic is still running. This likely means, we may need to adjust our loop structure, or find a different implementation.
Our committed changes produces a df_aggregate2
dataframe:
Note: we manually verified that the df1_aggregate
is very similar in structure
After the first column has been exploded into four columns (i.e.
Access
,Agent
,Article
,Language
), frombasic.R
, we need to aggregate the successive columns based on theMYYYY.mm
pattern. This means the date columns, will need to be summed with one another based on the aggregation.Note: we may later choose to move this logic, to a dedicated custom R package.