mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Investigate issue in updating EDGAR #98

Closed iangow closed 3 years ago

iangow commented 3 years ago

For example, in running ./update_edgar.sh, I see the messages below. This is running on my local server, but I suspect the same issues would appear if you ran the code on the MCCGR server.

igow@igow-ubuntu-mate:~/git/edgar$ ./update_edgar.sh 
Running get_filings.R ...
Updating data for 2020Q4...
Running get_accession_nos.R ...
Running get_filer_ciks.R ...
Running get_item_nos.R ...
Processing batch 1 of 3 ... 14.043 seconds
Processing batch 2 of 3 ... 20.294 seconds
Processing batch 3 of 3 ... 47.285 seconds
Running get_item_no_desc.R ...
Running scrape_filing_docs.R ...
Loading required package: xml2
Processing batch 1 
Error: Argument 1 must have names.
Backtrace:
    █
 1. └─dplyr::bind_rows(temp)
In addition: Warning message:
In mclapply(file_names$file_name, filing_docs_df, mc.cores = 8) :
  all scheduled cores encountered errors in user code
Execution halted
Updating Form 3/4/5 data ...

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union

Loading required package: xml2

Attaching package: ‘tidyr’

The following object is masked from ‘package:RCurl’:

    complete

[1] 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
In addition: In addition: Warning message:
Warning message:
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: In addition: Warning message:
Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Warning message:
The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning messages:
1: The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: The `i` argument of ``[.tbl_df`()` must lie in [-rows, 0] if negative, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `$<-.data.frame`(`*tmp*`, file_name, value = NA_character_) : 
  replacement has 1 row, data has 0
Error in `$<-.data.frame`(`*tmp*`, remarks, value = NA) : 
  replacement has 1 row, data has 0
In addition: Warning messages:
1: The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: The `i` argument of ``[.tbl_df`()` must lie in [-rows, 0] if negative, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Error : Can't combine `..1$footnote_variable` <logical> and `..2$footnote_variable` <character>.
Error in `[.data.frame`(df, , sig_cols) : undefined columns selected
Error in data.frame(file_name = file_name, document = document, form_type = form_type,  : 
  arguments imply differing number of rows: 0, 1
Error in data.frame(file_name = batch$file_name, document = batch$document,  : 
  arguments imply differing number of rows: 0, 2
In addition: Warning messages:
1: The `i` argument of ``[.tbl_df`()` must lie in [0, rows] if positive, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: The `i` argument of ``[.tbl_df`()` must lie in [-rows, 0] if negative, as of tibble 3.0.0.
Use `NA_integer_` as row index to obtain a row full of `NA` values.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
Execution halted
igow@igow-ubuntu-mate:~/git/edgar$ 
bdcallen commented 3 years ago

@iangow So it seems there are two problems here

Let's look at each in isolation

bdcallen commented 3 years ago

@iangow With regard to filing_docs/scrape_filing_docs.R, this seems to be the offending piece of code inside the function filing_docs_df

df <-
            file_tables %>%
            html_table_mod() %>%
            bind_rows() %>%
            fix_names() %>%
            mutate(file_name = file_name,
                   type = as.character(type),
                   description = as.character(description)) %>%
            separate(col = document,
                     into = c("document", "document_note"),
                     sep = "[:space:]+")

In particular, it seems to just be the last part of the command

separate(col = document,
                     into = c("document", "document_note"),
                     sep = "[:space:]+")

that's failing. I'm getting the error message

> filing_docs_df('edgar/data/1406815/0000899243-20-026684.txt')
Error in gregexpr(pattern, x, perl = TRUE) : 
  invalid regular expression '[:space:]+'
In addition: Warning message:
In gregexpr(pattern, x, perl = TRUE) :
bdcallen commented 3 years ago

So I replaced [:space:] with [\\s], and the code worked, though with some warnings

> file_tables %>%
+             html_table_mod() %>%
+             bind_rows() %>%
+             fix_names() %>%
+             mutate(file_name = file_name,
+                    type = as.character(type),
+                    description = as.character(description)) %>%
+             separate(col = document,
+                      into = c("document", "document_note"),
+                      sep = "[\\s]+")
   seq                                            description                 document document_note       type     size
1    1                                                   10-Q           ns1q2010-q.htm         iXBRL       10-Q  2313661
2    2                                          EXHIBIT 10.03     ns1q2010-qex1003.htm          <NA>   EX-10.03   115323
3    3                                          EXHIBIT 31.01     ns1q2010-qex3101.htm          <NA>   EX-31.01     8322
4    4                                          EXHIBIT 31.02     ns1q2010-qex3102.htm          <NA>   EX-31.02     8330
5    5                                          EXHIBIT 32.01     ns1q2010-qex3201.htm          <NA>   EX-32.01     5360
6    6                                          EXHIBIT 32.02     ns1q2010-qex3202.htm          <NA>   EX-32.02     5384
7   12                                                                   nslogoa04.jpg          <NA>    GRAPHIC   102220
8   NA                          Complete submission text file 0001110805-20-000051.txt          <NA>            10923484
9    7                XBRL TAXONOMY EXTENSION SCHEMA DOCUMENT          ns-20200331.xsd          <NA> EX-101.SCH    49941
10   8  XBRL TAXONOMY EXTENSION CALCULATION LINKBASE DOCUMENT      ns-20200331_cal.xml          <NA> EX-101.CAL   102954
11   9   XBRL TAXONOMY EXTENSION DEFINITION LINKBASE DOCUMENT      ns-20200331_def.xml          <NA> EX-101.DEF   365441
12  10        XBRL TAXONOMY EXTENSION LABEL LINKBASE DOCUMENT      ns-20200331_lab.xml          <NA> EX-101.LAB   630168
13  11 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE DOCUMENT      ns-20200331_pre.xml          <NA> EX-101.PRE   440711
14  30                       EXTRACTED XBRL INSTANCE DOCUMENT       ns1q2010-q_htm.xml          <NA>        XML  2514052
                                     file_name
1  edgar/data/1110805/0001110805-20-000051.txt
2  edgar/data/1110805/0001110805-20-000051.txt
3  edgar/data/1110805/0001110805-20-000051.txt
4  edgar/data/1110805/0001110805-20-000051.txt
5  edgar/data/1110805/0001110805-20-000051.txt
6  edgar/data/1110805/0001110805-20-000051.txt
7  edgar/data/1110805/0001110805-20-000051.txt
8  edgar/data/1110805/0001110805-20-000051.txt
9  edgar/data/1110805/0001110805-20-000051.txt
10 edgar/data/1110805/0001110805-20-000051.txt
11 edgar/data/1110805/0001110805-20-000051.txt
12 edgar/data/1110805/0001110805-20-000051.txt
13 edgar/data/1110805/0001110805-20-000051.txt
14 edgar/data/1110805/0001110805-20-000051.txt
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 13 rows [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14].
bdcallen commented 3 years ago

@iangow Just started running the program

(base) bdcallen@igow-z640:~/edgar$ filing_docs/scrape_filing_docs.R
Loading required package: xml2
Processing batch 1 
Writing data ...
86.19162 seconds

Seems to be working now. I think what happened was that filing_docs_df returned a bunch of NAs in this snippet here

temp <- mclapply(file_names$file_name, filing_docs_df, mc.cores = 8)
    if (length(temp) > 0) {
        df <- bind_rows(temp)

        if (nrow(df) > 0) {
            cat("Writing data ...\n")
            dbWriteTable(pg, "filing_docs",
                         df, append = TRUE, row.names = FALSE)

        } else {
            cat("No data ...\n")
        }
    }

leading to the failure in bind_rows, as the list in bind_rows needs to be a list of actual dataframes. I'm going to leave it running till it finishes.

bdcallen commented 3 years ago

I am rather curious why [:space:] does not seem to work anymore. Has there been some change to regular expressions in R since we wrote this code?

iangow commented 3 years ago

So I replaced [:space:] with [\\s], and the code worked, though with some warnings

I think it's better to use the fix in the commit above. [:space] is equivalent to \\s, so one needs [[:space]] to get the equivalent to [\\s]. I'm not sure why I made the switch to [:space], but perhaps better to use [[:space]] in case there was a good reason.

iangow commented 3 years ago

So I replaced [:space:] with [\\s], and the code worked, though with some warnings

> file_tables %>%
+             html_table_mod() %>%
+             bind_rows() %>%
+             fix_names() %>%
+             mutate(file_name = file_name,
+                    type = as.character(type),
+                    description = as.character(description)) %>%
+             separate(col = document,
+                      into = c("document", "document_note"),
+                      sep = "[\\s]+")
   seq                                            description                 document document_note       type     size
1    1                                                   10-Q           ns1q2010-q.htm         iXBRL       10-Q  2313661
2    2                                          EXHIBIT 10.03     ns1q2010-qex1003.htm          <NA>   EX-10.03   115323
3    3                                          EXHIBIT 31.01     ns1q2010-qex3101.htm          <NA>   EX-31.01     8322
4    4                                          EXHIBIT 31.02     ns1q2010-qex3102.htm          <NA>   EX-31.02     8330
5    5                                          EXHIBIT 32.01     ns1q2010-qex3201.htm          <NA>   EX-32.01     5360
6    6                                          EXHIBIT 32.02     ns1q2010-qex3202.htm          <NA>   EX-32.02     5384
7   12                                                                   nslogoa04.jpg          <NA>    GRAPHIC   102220
8   NA                          Complete submission text file 0001110805-20-000051.txt          <NA>            10923484
9    7                XBRL TAXONOMY EXTENSION SCHEMA DOCUMENT          ns-20200331.xsd          <NA> EX-101.SCH    49941
10   8  XBRL TAXONOMY EXTENSION CALCULATION LINKBASE DOCUMENT      ns-20200331_cal.xml          <NA> EX-101.CAL   102954
11   9   XBRL TAXONOMY EXTENSION DEFINITION LINKBASE DOCUMENT      ns-20200331_def.xml          <NA> EX-101.DEF   365441
12  10        XBRL TAXONOMY EXTENSION LABEL LINKBASE DOCUMENT      ns-20200331_lab.xml          <NA> EX-101.LAB   630168
13  11 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE DOCUMENT      ns-20200331_pre.xml          <NA> EX-101.PRE   440711
14  30                       EXTRACTED XBRL INSTANCE DOCUMENT       ns1q2010-q_htm.xml          <NA>        XML  2514052
                                     file_name
1  edgar/data/1110805/0001110805-20-000051.txt
2  edgar/data/1110805/0001110805-20-000051.txt
3  edgar/data/1110805/0001110805-20-000051.txt
4  edgar/data/1110805/0001110805-20-000051.txt
5  edgar/data/1110805/0001110805-20-000051.txt
6  edgar/data/1110805/0001110805-20-000051.txt
7  edgar/data/1110805/0001110805-20-000051.txt
8  edgar/data/1110805/0001110805-20-000051.txt
9  edgar/data/1110805/0001110805-20-000051.txt
10 edgar/data/1110805/0001110805-20-000051.txt
11 edgar/data/1110805/0001110805-20-000051.txt
12 edgar/data/1110805/0001110805-20-000051.txt
13 edgar/data/1110805/0001110805-20-000051.txt
14 edgar/data/1110805/0001110805-20-000051.txt
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 13 rows [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14].

The warning about NA values can be suppressed by an argument to separate. These should be innocuous, as rows without "document_note" will be common.