Closed iangow closed 3 years ago
@iangow Ok, so I spent last night looking at this and I spotted the error, and spent time fixing it. The issue seemed to be an error with bind_rows between dataframes scraped from the data, and an empty dataframe typically defined by the type of code below
column_names <- c(col1, col2, col3, ..., coln)
df <- data.frame(matrix(nrow=0, ncol=n), stringsAsFactors = FALSE)
colnames(df) <-
The problem was that the columns of df were taken implicitly by R now to be of type logical
, whereas the analogous columns from dataframes with scraped data were of type character
. Thus what I did was I defined a new function, make_empty_dataframe_w_colnames
make_empty_dataframe_w_colnames <- function(column_names) {
num_cols = length(column_names)
empty_df <- data.frame(matrix(nrow = 0, ncol = num_cols), stringsAsFactors = FALSE)
colnames(empty_df) <- column_names
for (column in column_names) {
# Initialize the columns to be character, so that raw data can be written in
empty_df[, column] <- as.character(empty_df[, column])
}
return(empty_df)
}
which defines an empty dataframe with the column_names
but with all columns typecast to character
. I then rewrote all the functions defining the dataframes to be written to the tables in the database in terms of this function, removing the kind of snippets defining the logical
empty dataframes above. After that, I ran the program and it has worked well
(base) bdcallen@igow-z640:~/edgar$ forms345/update_forms_345_tables.sh
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
Loading required package: xml2
Attaching package: ‘tidyr’
The following object is masked from ‘package:RCurl’:
complete
[1] 0
[1] "Total time taken: \n"
user system elapsed
3845.981 561.281 457.426
[1] "Number of full successes: \n"
[1] 10000
[1] "Number of filings processed: \n"
[1] 10000
Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' applied to an object of class "logical"
Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' appl ....
.
.
.
[1] "Number of full successes: \n"
[1] 159984
[1] "Number of filings processed: \n"
[1] 160000
[1] "Total time taken: \n"
user system elapsed
63902.187 9132.209 8408.463
[1] "Number of full successes: \n"
[1] 165297
[1] "Number of filings processed: \n"
[1] 165313
(base) bdcallen@igow-z640:~/edgar$
Don't worry too much about the error messages here, they're error messages from bad cases I'm pretty sure.
I'll commit the new code and close shortly.
@bdcallen Seems not to be working.