mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Address multiple tables in `filing_docs`. #52

Closed iangow closed 5 years ago

iangow commented 5 years ago

I made a tweak here to address an issue in scraping filings like this. Basically the code was trying to scrape two tables that could not be combined. The simple "fix" I made was to only scrape the first table.

But this creates an issue with filings like this.

We need to somehow fix the code so that it doesn't choke on the first filing, but does handle multiple tables (as in the second filing). For now, let's limit this issue to constructing code that works; we can worry about actually running it later.

iangow commented 5 years ago

My guess is that we want to get all the HTML tables.

dfs <-
    table_nodes %>%      
    html_table()

Then check each element of dfs to make sure it is of the correct form before passing it to bind_rows, etc.

bdcallen commented 5 years ago

@iangow It turns out there is an easy way to check if the elements are of the correct form. Each of the table nodes has an attribute called class, which is equal to tableFile if it is a table of the form with the filing documents

> head_url <- 'https://www.sec.gov/Archives/edgar/data/320193/000119312509214859/0001193125-09-214859-index.htm'
> table_nodes <-
+         read_html(head_url, encoding="Latin1") %>%
+         html_nodes("table")
> table_nodes
{xml_nodeset (2)}
[1] <table class="tableFile" summary="Document Format Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n            <th scope="col" s ...
[2] <table class="tableFile" summary="Data Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n            <th scope="col" style="width ...
> which(table_nodes %>% html_attr("class") == "tableFile")
[1] 1 2
> filing_doc_table_indices <- which(table_nodes %>% html_attr("class") == "tableFile")
> filing_doc_table_indices
[1] 1 2
> table_nodes[filing_doc_table_indices]
{xml_nodeset (2)}
[1] <table class="tableFile" summary="Document Format Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n            <th scope="col" s ...
[2] <table class="tableFile" summary="Data Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n            <th scope="col" style="width ...

There is also another useful attribute called summary which contains the name of the table, ie. Document Format Files etc...

bdcallen commented 5 years ago

@iangow Using a new function filing_docs_df, defined as the part of get_filing_docs which turns the relevant tables into a dataframe to be written to `edgar.filing_docs

filing_docs_df <- function(file_name) {

    head_url <- get_index_url(file_name)

    table_nodes <-
        read_html(head_url, encoding="Latin1") %>%
        html_nodes("table")

    filing_doc_table_indices <- which(table_nodes %>% html_attr("class") == "tableFile")

    file_tables <- table_nodes[filing_doc_table_indices]

    if (length(file_tables) < 1) {
        df <- tibble(seq = NA, description = NA, document = NA, type = NA,
                     size = NA, file_name = file_name)
    } else {

        df <- file_tables %>% html_table() %>% bind_rows() %>% fix_names() %>% mutate(file_name = file_name, type = as.character(type))

        colnames(df) <- tolower(colnames(df))
    }

    return(df)

}

I did the following

> f2 <- 'edgar/data/1046404/0000871839-18-000061-index.txt'
> get_index_url(f2)
[1] "https://www.sec.gov/Archives/edgar/data/1046404/000087183918000061/0000871839-18-000061-index.htm"
> filing_docs_df(f2)
   seq                   description                         document    type   size                                         file_name
1    1                               proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
2    2                       GRAPHIC            img_3fb1dcc13ad04.jpg GRAPHIC   2695 edgar/data/1046404/0000871839-18-000061-index.txt
3    3                       GRAPHIC            img_7c4a99f133244.jpg GRAPHIC  50688 edgar/data/1046404/0000871839-18-000061-index.txt
4    4                       GRAPHIC            img_39af8f5852b44.jpg GRAPHIC  41924 edgar/data/1046404/0000871839-18-000061-index.txt
5    5                       GRAPHIC            img_58e800b8f91b4.jpg GRAPHIC  46154 edgar/data/1046404/0000871839-18-000061-index.txt
6    6                       GRAPHIC            img_61e4e19e30d84.jpg GRAPHIC   3289 edgar/data/1046404/0000871839-18-000061-index.txt
7    7                       GRAPHIC            img_742ca67764644.jpg GRAPHIC   1952 edgar/data/1046404/0000871839-18-000061-index.txt
8    8                       GRAPHIC            img_18245b4143de4.jpg GRAPHIC   1957 edgar/data/1046404/0000871839-18-000061-index.txt
9    9                       GRAPHIC            img_204353f228fd4.jpg GRAPHIC   3289 edgar/data/1046404/0000871839-18-000061-index.txt
10  10                       GRAPHIC            img_286584bbcac34.jpg GRAPHIC  48487 edgar/data/1046404/0000871839-18-000061-index.txt
11  11                       GRAPHIC            img_a17ec2d0123f4.jpg GRAPHIC   3622 edgar/data/1046404/0000871839-18-000061-index.txt
12  12                       GRAPHIC            img_bda416a65f094.jpg GRAPHIC  63014 edgar/data/1046404/0000871839-18-000061-index.txt
13  13                       GRAPHIC            img_c147f393a5fe4.jpg GRAPHIC   1953 edgar/data/1046404/0000871839-18-000061-index.txt
14  14                       GRAPHIC            img_ddb30f382a384.jpg GRAPHIC   4610 edgar/data/1046404/0000871839-18-000061-index.txt
15  15                       GRAPHIC            img_e82bb2bb0af14.jpg GRAPHIC   1771 edgar/data/1046404/0000871839-18-000061-index.txt
16  16                       GRAPHIC            img_e89a234d074c4.jpg GRAPHIC  39813 edgar/data/1046404/0000871839-18-000061-index.txt
17  17                       GRAPHIC            img_ea1cfc908cbf4.jpg GRAPHIC   4606 edgar/data/1046404/0000871839-18-000061-index.txt
18  18                       GRAPHIC            img_ea5f423baea34.jpg GRAPHIC   2695 edgar/data/1046404/0000871839-18-000061-index.txt
19  19                       GRAPHIC            img_ed7f775d2ba74.jpg GRAPHIC   1953 edgar/data/1046404/0000871839-18-000061-index.txt
20  NA Complete submission text file         0000871839-18-000061.txt         985260 edgar/data/1046404/0000871839-18-000061-index.txt

which shows that this function handled the problematic case you put in the original post.

bdcallen commented 5 years ago

@iangow Here's the outcome of filing_docs_df acting on the second filing that you put in the original post, the one with two tables with filing documents

> file_name <- 'edgar/data/320193/0001193125-09-214859.txt'
> get_index_url(file_name)
[1] "https://www.sec.gov/Archives/edgar/data/320193/000119312509214859/0001193125-09-214859-index.htm"
> filing_docs_df(file_name)
   seq                                                         description                 document       type    size                                  file_name
1    1                        FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009                 d10k.htm       10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2    2                                      SUBSIDIARIES OF THE REGISTRANT               dex211.htm    EX-21.1    2792 edgar/data/320193/0001193125-09-214859.txt
3    3                                        CONSENT OF ERNST & YOUNG LLP               dex231.htm    EX-23.1    1634 edgar/data/320193/0001193125-09-214859.txt
4    4                                                 CONSENT OF KPMG LLP               dex232.htm    EX-23.2    2390 edgar/data/320193/0001193125-09-214859.txt
5    5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER               dex311.htm    EX-31.1    9851 edgar/data/320193/0001193125-09-214859.txt
6    6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER               dex312.htm    EX-31.2   10112 edgar/data/320193/0001193125-09-214859.txt
7    7                          SECTION 1350 CERTIFICATIONS OF CEO AND CFO               dex321.htm    EX-32.1    5354 edgar/data/320193/0001193125-09-214859.txt
8   14                                                             GRAPHIC         g91485g21p46.jpg    GRAPHIC   53857 edgar/data/320193/0001193125-09-214859.txt
9   NA                                       Complete submission text file 0001193125-09-214859.txt            3638340 edgar/data/320193/0001193125-09-214859.txt
10   8                                              XBRL INSTANCE DOCUMENT        aapl-20090926.xml EX-101.INS  760344 edgar/data/320193/0001193125-09-214859.txt
11   9                                      XBRL TAXONOMY EXTENSION SCHEMA        aapl-20090926.xsd EX-101.SCH   13066 edgar/data/320193/0001193125-09-214859.txt
12  10                        XBRL TAXONOMY EXTENSION CALCULATION LINKBASE    aapl-20090926_cal.xml EX-101.CAL   30955 edgar/data/320193/0001193125-09-214859.txt
13  11                         XBRL TAXONOMY EXTENSION DEFINITION LINKBASE    aapl-20090926_def.xml EX-101.DEF   19450 edgar/data/320193/0001193125-09-214859.txt
14  12                              XBRL TAXONOMY EXTENSION LABEL LINKBASE    aapl-20090926_lab.xml EX-101.LAB  100641 edgar/data/320193/0001193125-09-214859.txt
15  13                       XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE    aapl-20090926_pre.xml EX-101.PRE   80647 edgar/data/320193/0001193125-09-214859.txt

As you can see by comparing to the linked page for the filing, all the files from the Document Format Files table and those of the Data Files table are joined in the one dataframe.

iangow commented 5 years ago

OK. Perfect. So get_filing_docs.R should just use this alternative function. Are you able to test and commit a version that does this?

There's probably a need to go back and try to identify the filings that have been "done wrong"; we should create a separate issue for this (I think the best approach might involve looking for breaks in seq).

iangow commented 5 years ago

Sorry. That should be scrape_filing_docs.R. I have stopped the code I have been running on 10.101.13.99 so that you are able to run it without interference. Note that I simply run source('filing_docs/scrape_filing_docs.R', echo=TRUE) from the edgar project in RStudio.

bdcallen commented 5 years ago

@iangow In my latest commit, I just committed filing_docs_df and changed get_filing_docs to

get_filing_docs <- function(file_name) {

    try({

    df <- filing_docs_df(file_name)
    pg <- dbConnect(PostgreSQL())
    dbWriteTable(pg, c("edgar", "filing_docs"),
                 df, append = TRUE, row.names = FALSE)
    dbDisconnect(pg)

    return(TRUE)}, {return(FALSE)})

}

inside the file get_filing_doc_functions.R. Splitting the functions this way makes it convenient for testing, since we can use filing_docs_df to see if the dataframe is of the right form/successfully scraped, and change what we need to here, and use get_filing_docs to write.

I agree, catching breaks in seq is the way to go with getting the filings which have been wrongly handled, especially now since my code combines the tables with filing documents, over which we know seq runs, so these types of errors should now be much more rare (when we run this code over the whole set of filings).

iangow commented 5 years ago

OK. I pulled the latest code and I'm running it now ... no need for you to do so. I will let you know if any issues crop up. (I am seeing the code [before your commit] "hanging" at times. Not sure why; I'm just terminating it and running it again. I believe the code may grab filings somewhat at random, so it may be a small number of filings causing an issue.)

bdcallen commented 5 years ago

@iangow I just noticed get_filing_docs had to be changed in scrape_filing_docs.R. As the functions get_index_url, fix_names, and the updated get_filing_docs are in get_filing_doc_functions.R, I've done this in my latest commit using the source function. So if you now pull again and restart, scrape_filing_docs.R will use the updated get_filing_docs.

iangow commented 5 years ago

Here's some code for that seq issue once you have it set up:

library(DBI)
library(dplyr, warn.conflicts = FALSE)

pg <- dbConnect(RPostgreSQL::PostgreSQL()) #  bigint = "integer")
rs <- dbExecute(pg, "SET search_path TO edgar, public")
rs <- dbExecute(pg, "SET work_mem = '2GB'")

filings <- tbl(pg, "filings")
filing_docs <- tbl(pg, "filing_docs")

problems <-
    filing_docs %>% 
    filter(!is.na(seq)) %>%
    group_by(file_name) %>% 
    summarize(seqs = array_agg(seq), 
           seq_max = max(seq, na.rm = TRUE)) %>%
    mutate(seq_len = array_length(seqs, 1L)) %>% 
    filter(seq_max != seq_len) %>%
    ungroup()

problems
#> # Source:   lazy query [?? x 4]
#> # Database: postgres 9.6.11 [igow@10.101.13.99:5432/crsp]
#>    file_name                           seqs                 seq_max seq_len
#>    <chr>                               <chr>                  <int>   <int>
#>  1 edgar/data/1000015/0000912057-00-0… {1,2,3,4,5,6,1,2,3,…       6      12
#>  2 edgar/data/1000015/0000912057-01-5… {1,3}                      3       2
#>  3 edgar/data/1000015/0000912057-02-0… {1,3}                      3       2
#>  4 edgar/data/1000015/0000912057-02-0… {1,3,4,5,6,7,8,9,10…      12      11
#>  5 edgar/data/1000015/0000912057-02-0… {1,3}                      3       2
#>  6 edgar/data/1000015/0000912057-02-0… {1,3}                      3       2
#>  7 edgar/data/1000015/0001005477-02-0… {1,3}                      3       2
#>  8 edgar/data/1000015/0001047469-02-0… {1,3}                      3       2
#>  9 edgar/data/1000015/0001047469-03-0… {1,3,4,5,6,7,8,9,10…      11      10
#> 10 edgar/data/1000015/0001047469-03-0… {1,3}                      3       2
#> # … with more rows

Created on 2019-01-22 by the reprex package (v0.2.1)

bdcallen commented 5 years ago

@iangow Some of these may be duplicates. I just checked the first filing in problems, and its ok if seq runs just from 1 to 6 as there is just one table, but from seqs we can see that there are two copies of each integer.

iangow commented 5 years ago

I noticed that. Let's make that other issue and we can address this issue there.

bdcallen commented 5 years ago

@iangow There are also some odd entries. The second filing in problems has only one table, and there is no seq = 2 entry.

iangow commented 5 years ago

Again, let's put the discussion in a new issue. Otherwise this will get complicated.

iangow commented 5 years ago

@iangow I just noticed get_filing_docs had to be changed in scrape_filing_docs.R. As the functions get_index_url, fix_names, and the updated get_filing_docs are in get_filing_doc_functions.R, I've done this in my latest commit using the source function. So if you now pull again and restart, scrape_filing_docs.R will use the updated get_filing_docs.

Now the code doesn't work:

> source('~/git/edgar/filing_docs/scrape_filing_docs.R')

Attaching package: ‘readr’

The following object is masked from ‘package:rvest’:

    guess_encoding

Processing batch 1 
Error in bind_rows_(x, .id) : Argument 1 must have names

I think this is the error message I addressed by using [1] to grab the first table.

bdcallen commented 5 years ago

@iangow

> file_name
[1] "edgar/data/320193/0001193125-09-214859.txt"
> df1 <- filing_docs_df(file_name)
> df1
   seq                                                         description                 document       type    size                                  file_name
1    1                        FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009                 d10k.htm       10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2    2                                      SUBSIDIARIES OF THE REGISTRANT               dex211.htm    EX-21.1    2792 edgar/data/320193/0001193125-09-214859.txt
3    3                                        CONSENT OF ERNST & YOUNG LLP               dex231.htm    EX-23.1    1634 edgar/data/320193/0001193125-09-214859.txt
4    4                                                 CONSENT OF KPMG LLP               dex232.htm    EX-23.2    2390 edgar/data/320193/0001193125-09-214859.txt
5    5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER               dex311.htm    EX-31.1    9851 edgar/data/320193/0001193125-09-214859.txt
6    6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER               dex312.htm    EX-31.2   10112 edgar/data/320193/0001193125-09-214859.txt
7    7                          SECTION 1350 CERTIFICATIONS OF CEO AND CFO               dex321.htm    EX-32.1    5354 edgar/data/320193/0001193125-09-214859.txt
8   14                                                             GRAPHIC         g91485g21p46.jpg    GRAPHIC   53857 edgar/data/320193/0001193125-09-214859.txt
9   NA                                       Complete submission text file 0001193125-09-214859.txt            3638340 edgar/data/320193/0001193125-09-214859.txt
10   8                                              XBRL INSTANCE DOCUMENT        aapl-20090926.xml EX-101.INS  760344 edgar/data/320193/0001193125-09-214859.txt
11   9                                      XBRL TAXONOMY EXTENSION SCHEMA        aapl-20090926.xsd EX-101.SCH   13066 edgar/data/320193/0001193125-09-214859.txt
12  10                        XBRL TAXONOMY EXTENSION CALCULATION LINKBASE    aapl-20090926_cal.xml EX-101.CAL   30955 edgar/data/320193/0001193125-09-214859.txt
13  11                         XBRL TAXONOMY EXTENSION DEFINITION LINKBASE    aapl-20090926_def.xml EX-101.DEF   19450 edgar/data/320193/0001193125-09-214859.txt
14  12                              XBRL TAXONOMY EXTENSION LABEL LINKBASE    aapl-20090926_lab.xml EX-101.LAB  100641 edgar/data/320193/0001193125-09-214859.txt
15  13                       XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE    aapl-20090926_pre.xml EX-101.PRE   80647 edgar/data/320193/0001193125-09-214859.txt
> f2
[1] "edgar/data/1046404/0000871839-18-000061-index.txt"
> df2 <- filing_docs_df(f2)
> df2
   seq                   description                         document    type   size                                         file_name
1    1                               proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
2    2                       GRAPHIC            img_3fb1dcc13ad04.jpg GRAPHIC   2695 edgar/data/1046404/0000871839-18-000061-index.txt
3    3                       GRAPHIC            img_7c4a99f133244.jpg GRAPHIC  50688 edgar/data/1046404/0000871839-18-000061-index.txt
4    4                       GRAPHIC            img_39af8f5852b44.jpg GRAPHIC  41924 edgar/data/1046404/0000871839-18-000061-index.txt
5    5                       GRAPHIC            img_58e800b8f91b4.jpg GRAPHIC  46154 edgar/data/1046404/0000871839-18-000061-index.txt
6    6                       GRAPHIC            img_61e4e19e30d84.jpg GRAPHIC   3289 edgar/data/1046404/0000871839-18-000061-index.txt
7    7                       GRAPHIC            img_742ca67764644.jpg GRAPHIC   1952 edgar/data/1046404/0000871839-18-000061-index.txt
8    8                       GRAPHIC            img_18245b4143de4.jpg GRAPHIC   1957 edgar/data/1046404/0000871839-18-000061-index.txt
9    9                       GRAPHIC            img_204353f228fd4.jpg GRAPHIC   3289 edgar/data/1046404/0000871839-18-000061-index.txt
10  10                       GRAPHIC            img_286584bbcac34.jpg GRAPHIC  48487 edgar/data/1046404/0000871839-18-000061-index.txt
11  11                       GRAPHIC            img_a17ec2d0123f4.jpg GRAPHIC   3622 edgar/data/1046404/0000871839-18-000061-index.txt
12  12                       GRAPHIC            img_bda416a65f094.jpg GRAPHIC  63014 edgar/data/1046404/0000871839-18-000061-index.txt
13  13                       GRAPHIC            img_c147f393a5fe4.jpg GRAPHIC   1953 edgar/data/1046404/0000871839-18-000061-index.txt
14  14                       GRAPHIC            img_ddb30f382a384.jpg GRAPHIC   4610 edgar/data/1046404/0000871839-18-000061-index.txt
15  15                       GRAPHIC            img_e82bb2bb0af14.jpg GRAPHIC   1771 edgar/data/1046404/0000871839-18-000061-index.txt
16  16                       GRAPHIC            img_e89a234d074c4.jpg GRAPHIC  39813 edgar/data/1046404/0000871839-18-000061-index.txt
17  17                       GRAPHIC            img_ea1cfc908cbf4.jpg GRAPHIC   4606 edgar/data/1046404/0000871839-18-000061-index.txt
18  18                       GRAPHIC            img_ea5f423baea34.jpg GRAPHIC   2695 edgar/data/1046404/0000871839-18-000061-index.txt
19  19                       GRAPHIC            img_ed7f775d2ba74.jpg GRAPHIC   1953 edgar/data/1046404/0000871839-18-000061-index.txt
20  NA Complete submission text file         0000871839-18-000061.txt         985260 edgar/data/1046404/0000871839-18-000061-index.txt
> bind_rows(df1, df2)
   seq                                                         description                         document       type    size                                         file_name
1    1                        FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009                         d10k.htm       10-K 1231750        edgar/data/320193/0001193125-09-214859.txt
2    2                                      SUBSIDIARIES OF THE REGISTRANT                       dex211.htm    EX-21.1    2792        edgar/data/320193/0001193125-09-214859.txt
3    3                                        CONSENT OF ERNST & YOUNG LLP                       dex231.htm    EX-23.1    1634        edgar/data/320193/0001193125-09-214859.txt
4    4                                                 CONSENT OF KPMG LLP                       dex232.htm    EX-23.2    2390        edgar/data/320193/0001193125-09-214859.txt
5    5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER                       dex311.htm    EX-31.1    9851        edgar/data/320193/0001193125-09-214859.txt
6    6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER                       dex312.htm    EX-31.2   10112        edgar/data/320193/0001193125-09-214859.txt
7    7                          SECTION 1350 CERTIFICATIONS OF CEO AND CFO                       dex321.htm    EX-32.1    5354        edgar/data/320193/0001193125-09-214859.txt
8   14                                                             GRAPHIC                 g91485g21p46.jpg    GRAPHIC   53857        edgar/data/320193/0001193125-09-214859.txt
9   NA                                       Complete submission text file         0001193125-09-214859.txt            3638340        edgar/data/320193/0001193125-09-214859.txt
10   8                                              XBRL INSTANCE DOCUMENT                aapl-20090926.xml EX-101.INS  760344        edgar/data/320193/0001193125-09-214859.txt
11   9                                      XBRL TAXONOMY EXTENSION SCHEMA                aapl-20090926.xsd EX-101.SCH   13066        edgar/data/320193/0001193125-09-214859.txt
12  10                        XBRL TAXONOMY EXTENSION CALCULATION LINKBASE            aapl-20090926_cal.xml EX-101.CAL   30955        edgar/data/320193/0001193125-09-214859.txt
13  11                         XBRL TAXONOMY EXTENSION DEFINITION LINKBASE            aapl-20090926_def.xml EX-101.DEF   19450        edgar/data/320193/0001193125-09-214859.txt
14  12                              XBRL TAXONOMY EXTENSION LABEL LINKBASE            aapl-20090926_lab.xml EX-101.LAB  100641        edgar/data/320193/0001193125-09-214859.txt
15  13                       XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE            aapl-20090926_pre.xml EX-101.PRE   80647        edgar/data/320193/0001193125-09-214859.txt
16   1                                                                     proxyadditionalmateri-201715.htm    DEF 14A  399094 edgar/data/1046404/0000871839-18-000061-index.txt
17   2                                                             GRAPHIC            img_3fb1dcc13ad04.jpg    GRAPHIC    2695 edgar/data/1046404/0000871839-18-000061-index.txt
18   3                                                             GRAPHIC            img_7c4a99f133244.jpg    GRAPHIC   50688 edgar/data/1046404/0000871839-18-000061-index.txt
19   4                                                             GRAPHIC            img_39af8f5852b44.jpg    GRAPHIC   41924 edgar/data/1046404/0000871839-18-000061-index.txt
20   5                                                             GRAPHIC            img_58e800b8f91b4.jpg    GRAPHIC   46154 edgar/data/1046404/0000871839-18-000061-index.txt
21   6                                                             GRAPHIC            img_61e4e19e30d84.jpg    GRAPHIC    3289 edgar/data/1046404/0000871839-18-000061-index.txt
22   7                                                             GRAPHIC            img_742ca67764644.jpg    GRAPHIC    1952 edgar/data/1046404/0000871839-18-000061-index.txt
23   8                                                             GRAPHIC            img_18245b4143de4.jpg    GRAPHIC    1957 edgar/data/1046404/0000871839-18-000061-index.txt
24   9                                                             GRAPHIC            img_204353f228fd4.jpg    GRAPHIC    3289 edgar/data/1046404/0000871839-18-000061-index.txt
25  10                                                             GRAPHIC            img_286584bbcac34.jpg    GRAPHIC   48487 edgar/data/1046404/0000871839-18-000061-index.txt
26  11                                                             GRAPHIC            img_a17ec2d0123f4.jpg    GRAPHIC    3622 edgar/data/1046404/0000871839-18-000061-index.txt
27  12                                                             GRAPHIC            img_bda416a65f094.jpg    GRAPHIC   63014 edgar/data/1046404/0000871839-18-000061-index.txt
28  13                                                             GRAPHIC            img_c147f393a5fe4.jpg    GRAPHIC    1953 edgar/data/1046404/0000871839-18-000061-index.txt
29  14                                                             GRAPHIC            img_ddb30f382a384.jpg    GRAPHIC    4610 edgar/data/1046404/0000871839-18-000061-index.txt
30  15                                                             GRAPHIC            img_e82bb2bb0af14.jpg    GRAPHIC    1771 edgar/data/1046404/0000871839-18-000061-index.txt
31  16                                                             GRAPHIC            img_e89a234d074c4.jpg    GRAPHIC   39813 edgar/data/1046404/0000871839-18-000061-index.txt
32  17                                                             GRAPHIC            img_ea1cfc908cbf4.jpg    GRAPHIC    4606 edgar/data/1046404/0000871839-18-000061-index.txt
33  18                                                             GRAPHIC            img_ea5f423baea34.jpg    GRAPHIC    2695 edgar/data/1046404/0000871839-18-000061-index.txt
34  19                                                             GRAPHIC            img_ed7f775d2ba74.jpg    GRAPHIC    1953 edgar/data/1046404/0000871839-18-000061-index.txt
35  NA                                       Complete submission text file         0000871839-18-000061.txt             985260 edgar/data/1046404/0000871839-18-000061-index.txt
> df_list = list()
> df_list <- list()
> df_list[[1]] <- df1
> df_list[[2]] <- df2
> df_list
[[1]]
   seq                                                         description                 document       type    size                                  file_name
1    1                        FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009                 d10k.htm       10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2    2                                      SUBSIDIARIES OF THE REGISTRANT               dex211.htm    EX-21.1    2792 edgar/data/320193/0001193125-09-214859.txt
3    3                                        CONSENT OF ERNST & YOUNG LLP               dex231.htm    EX-23.1    1634 edgar/data/320193/0001193125-09-214859.txt
4    4                                                 CONSENT OF KPMG LLP               dex232.htm    EX-23.2    2390 edgar/data/320193/0001193125-09-214859.txt
5    5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER               dex311.htm    EX-31.1    9851 edgar/data/320193/0001193125-09-214859.txt
6    6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER               dex312.htm    EX-31.2   10112 edgar/data/320193/0001193125-09-214859.txt
7    7                          SECTION 1350 CERTIFICATIONS OF CEO AND CFO               dex321.htm    EX-32.1    5354 edgar/data/320193/0001193125-09-214859.txt
8   14                                                             GRAPHIC         g91485g21p46.jpg    GRAPHIC   53857 edgar/data/320193/0001193125-09-214859.txt
9   NA                                       Complete submission text file 0001193125-09-214859.txt            3638340 edgar/data/320193/0001193125-09-214859.txt
10   8                                              XBRL INSTANCE DOCUMENT        aapl-20090926.xml EX-101.INS  760344 edgar/data/320193/0001193125-09-214859.txt
11   9                                      XBRL TAXONOMY EXTENSION SCHEMA        aapl-20090926.xsd EX-101.SCH   13066 edgar/data/320193/0001193125-09-214859.txt
12  10                        XBRL TAXONOMY EXTENSION CALCULATION LINKBASE    aapl-20090926_cal.xml EX-101.CAL   30955 edgar/data/320193/0001193125-09-214859.txt
13  11                         XBRL TAXONOMY EXTENSION DEFINITION LINKBASE    aapl-20090926_def.xml EX-101.DEF   19450 edgar/data/320193/0001193125-09-214859.txt
14  12                              XBRL TAXONOMY EXTENSION LABEL LINKBASE    aapl-20090926_lab.xml EX-101.LAB  100641 edgar/data/320193/0001193125-09-214859.txt
15  13                       XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE    aapl-20090926_pre.xml EX-101.PRE   80647 edgar/data/320193/0001193125-09-214859.txt

[[2]]
   seq                   description                         document    type   size                                         file_name
1    1                               proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
2    2                       GRAPHIC            img_3fb1dcc13ad04.jpg GRAPHIC   2695 edgar/data/1046404/0000871839-18-000061-index.txt
3    3                       GRAPHIC            img_7c4a99f133244.jpg GRAPHIC  50688 edgar/data/1046404/0000871839-18-000061-index.txt
4    4                       GRAPHIC            img_39af8f5852b44.jpg GRAPHIC  41924 edgar/data/1046404/0000871839-18-000061-index.txt
5    5                       GRAPHIC            img_58e800b8f91b4.jpg GRAPHIC  46154 edgar/data/1046404/0000871839-18-000061-index.txt
6    6                       GRAPHIC            img_61e4e19e30d84.jpg GRAPHIC   3289 edgar/data/1046404/0000871839-18-000061-index.txt
7    7                       GRAPHIC            img_742ca67764644.jpg GRAPHIC   1952 edgar/data/1046404/0000871839-18-000061-index.txt
8    8                       GRAPHIC            img_18245b4143de4.jpg GRAPHIC   1957 edgar/data/1046404/0000871839-18-000061-index.txt
9    9                       GRAPHIC            img_204353f228fd4.jpg GRAPHIC   3289 edgar/data/1046404/0000871839-18-000061-index.txt
10  10                       GRAPHIC            img_286584bbcac34.jpg GRAPHIC  48487 edgar/data/1046404/0000871839-18-000061-index.txt
11  11                       GRAPHIC            img_a17ec2d0123f4.jpg GRAPHIC   3622 edgar/data/1046404/0000871839-18-000061-index.txt
12  12                       GRAPHIC            img_bda416a65f094.jpg GRAPHIC  63014 edgar/data/1046404/0000871839-18-000061-index.txt
13  13                       GRAPHIC            img_c147f393a5fe4.jpg GRAPHIC   1953 edgar/data/1046404/0000871839-18-000061-index.txt
14  14                       GRAPHIC            img_ddb30f382a384.jpg GRAPHIC   4610 edgar/data/1046404/0000871839-18-000061-index.txt
15  15                       GRAPHIC            img_e82bb2bb0af14.jpg GRAPHIC   1771 edgar/data/1046404/0000871839-18-000061-index.txt
16  16                       GRAPHIC            img_e89a234d074c4.jpg GRAPHIC  39813 edgar/data/1046404/0000871839-18-000061-index.txt
17  17                       GRAPHIC            img_ea1cfc908cbf4.jpg GRAPHIC   4606 edgar/data/1046404/0000871839-18-000061-index.txt
18  18                       GRAPHIC            img_ea5f423baea34.jpg GRAPHIC   2695 edgar/data/1046404/0000871839-18-000061-index.txt
19  19                       GRAPHIC            img_ed7f775d2ba74.jpg GRAPHIC   1953 edgar/data/1046404/0000871839-18-000061-index.txt
20  NA Complete submission text file         0000871839-18-000061.txt         985260 edgar/data/1046404/0000871839-18-000061-index.txt

> bind_rows(df_list)
   seq                                                         description                         document       type    size                                         file_name
1    1                        FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009                         d10k.htm       10-K 1231750        edgar/data/320193/0001193125-09-214859.txt
2    2                                      SUBSIDIARIES OF THE REGISTRANT                       dex211.htm    EX-21.1    2792        edgar/data/320193/0001193125-09-214859.txt
3    3                                        CONSENT OF ERNST & YOUNG LLP                       dex231.htm    EX-23.1    1634        edgar/data/320193/0001193125-09-214859.txt
4    4                                                 CONSENT OF KPMG LLP                       dex232.htm    EX-23.2    2390        edgar/data/320193/0001193125-09-214859.txt
5    5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER                       dex311.htm    EX-31.1    9851        edgar/data/320193/0001193125-09-214859.txt
6    6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER                       dex312.htm    EX-31.2   10112        edgar/data/320193/0001193125-09-214859.txt
7    7                          SECTION 1350 CERTIFICATIONS OF CEO AND CFO                       dex321.htm    EX-32.1    5354        edgar/data/320193/0001193125-09-214859.txt
8   14                                                             GRAPHIC                 g91485g21p46.jpg    GRAPHIC   53857        edgar/data/320193/0001193125-09-214859.txt
9   NA                                       Complete submission text file         0001193125-09-214859.txt            3638340        edgar/data/320193/0001193125-09-214859.txt
10   8                                              XBRL INSTANCE DOCUMENT                aapl-20090926.xml EX-101.INS  760344        edgar/data/320193/0001193125-09-214859.txt
11   9                                      XBRL TAXONOMY EXTENSION SCHEMA                aapl-20090926.xsd EX-101.SCH   13066        edgar/data/320193/0001193125-09-214859.txt
12  10                        XBRL TAXONOMY EXTENSION CALCULATION LINKBASE            aapl-20090926_cal.xml EX-101.CAL   30955        edgar/data/320193/0001193125-09-214859.txt
13  11                         XBRL TAXONOMY EXTENSION DEFINITION LINKBASE            aapl-20090926_def.xml EX-101.DEF   19450        edgar/data/320193/0001193125-09-214859.txt
14  12                              XBRL TAXONOMY EXTENSION LABEL LINKBASE            aapl-20090926_lab.xml EX-101.LAB  100641        edgar/data/320193/0001193125-09-214859.txt
15  13                       XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE            aapl-20090926_pre.xml EX-101.PRE   80647        edgar/data/320193/0001193125-09-214859.txt
16   1                                                                     proxyadditionalmateri-201715.htm    DEF 14A  399094 edgar/data/1046404/0000871839-18-000061-index.txt
17   2                                                             GRAPHIC            img_3fb1dcc13ad04.jpg    GRAPHIC    2695 edgar/data/1046404/0000871839-18-000061-index.txt
18   3                                                             GRAPHIC            img_7c4a99f133244.jpg    GRAPHIC   50688 edgar/data/1046404/0000871839-18-000061-index.txt
19   4                                                             GRAPHIC            img_39af8f5852b44.jpg    GRAPHIC   41924 edgar/data/1046404/0000871839-18-000061-index.txt
20   5                                                             GRAPHIC            img_58e800b8f91b4.jpg    GRAPHIC   46154 edgar/data/1046404/0000871839-18-000061-index.txt
21   6                                                             GRAPHIC            img_61e4e19e30d84.jpg    GRAPHIC    3289 edgar/data/1046404/0000871839-18-000061-index.txt
22   7                                                             GRAPHIC            img_742ca67764644.jpg    GRAPHIC    1952 edgar/data/1046404/0000871839-18-000061-index.txt
23   8                                                             GRAPHIC            img_18245b4143de4.jpg    GRAPHIC    1957 edgar/data/1046404/0000871839-18-000061-index.txt
24   9                                                             GRAPHIC            img_204353f228fd4.jpg    GRAPHIC    3289 edgar/data/1046404/0000871839-18-000061-index.txt
25  10                                                             GRAPHIC            img_286584bbcac34.jpg    GRAPHIC   48487 edgar/data/1046404/0000871839-18-000061-index.txt
26  11                                                             GRAPHIC            img_a17ec2d0123f4.jpg    GRAPHIC    3622 edgar/data/1046404/0000871839-18-000061-index.txt
27  12                                                             GRAPHIC            img_bda416a65f094.jpg    GRAPHIC   63014 edgar/data/1046404/0000871839-18-000061-index.txt
28  13                                                             GRAPHIC            img_c147f393a5fe4.jpg    GRAPHIC    1953 edgar/data/1046404/0000871839-18-000061-index.txt
29  14                                                             GRAPHIC            img_ddb30f382a384.jpg    GRAPHIC    4610 edgar/data/1046404/0000871839-18-000061-index.txt
30  15                                                             GRAPHIC            img_e82bb2bb0af14.jpg    GRAPHIC    1771 edgar/data/1046404/0000871839-18-000061-index.txt
31  16                                                             GRAPHIC            img_e89a234d074c4.jpg    GRAPHIC   39813 edgar/data/1046404/0000871839-18-000061-index.txt
32  17                                                             GRAPHIC            img_ea1cfc908cbf4.jpg    GRAPHIC    4606 edgar/data/1046404/0000871839-18-000061-index.txt
33  18                                                             GRAPHIC            img_ea5f423baea34.jpg    GRAPHIC    2695 edgar/data/1046404/0000871839-18-000061-index.txt
34  19                                                             GRAPHIC            img_ed7f775d2ba74.jpg    GRAPHIC    1953 edgar/data/1046404/0000871839-18-000061-index.txt
35  NA                                       Complete submission text file         0000871839-18-000061.txt             985260 edgar/data/1046404/0000871839-18-000061-index.txt

Here, I tried binding the rows of the dataframes produced by filing_docs_df on two different filings, and was successful. So it can't be the .[1] indexing

bdcallen commented 5 years ago

@iangow did the source('get_filing_doc_functions.R') line work? The other place where bind_rows is used is here

while(nrow(file_names <- get_file_names()) > 0) {
    batch <- batch + 1
    cat("Processing batch", batch, "\n")
    temp <- mclapply(file_names$file_name, get_filing_docs, mc.cores = 6)
    if (length(temp) > 0) {
        df <- bind_rows(temp)

        if (nrow(df) > 0) {
            cat("Writing data ...\n")
            dbWriteTable(pg, "filing_docs",
                         df, append = TRUE, row.names = FALSE)

        } else {
            cat("No data ...\n")
        }
    }
}

If that line didn't work, then get_filing_docs would not be in the namespace, and the bind_rows line will return an error.

iangow commented 5 years ago

The way get_filing_docs functions is very different in the new version (for example, the new one returns TRUE or FALSE, while the old one returned the scraped data). I think the easiest thing to do would be to adapt the "old" function to incorporate the error-handling, etc.

iangow commented 5 years ago

The good news is that I think this is the only code using this function now, so you might be able to find some shortcuts.

bdcallen commented 5 years ago

@iangow As I just said, essentially filing_docs_df does what your version of get_filing_docs did, but with the update of including all tables with filing documents, as it likewise returns a dataframe rather than a Boolean. I just did

> file_names <- get_file_names()
> temp <- mclapply(file_names$file_name, filing_docs_df, mc.cores = 6)
> df <- bind_rows(temp)
> df
    seq                                                                      description                   document      type    size                                   file_name
1     1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-048085.paper  REGDEX/A     295 edgar/data/1132469/9999999997-06-048085.txt
2    NA                                                           Scanned paper document                scanned.pdf            243893 edgar/data/1132469/9999999997-06-048085.txt
3    NA                                                    Complete submission text file   9999999997-06-048085.txt              1459 edgar/data/1132469/9999999997-06-048085.txt
4     1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-038939.paper    FOCUSN     293   edgar/data/78017/9999999997-06-038939.txt
5    NA                                                    Complete submission text file   9999999997-06-038939.txt              1847   edgar/data/78017/9999999997-06-038939.txt
6     1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-028915.paper   X-17A-5     294   edgar/data/78017/9999999997-06-028915.txt
7    NA                                                           Scanned paper document                scanned.pdf            393873   edgar/data/78017/9999999997-06-028915.txt
8    NA                                                    Complete submission text file   9999999997-06-028915.txt              1850   edgar/data/78017/9999999997-06-028915.txt
9     1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-013694.paper    TA-1/A     293  edgar/data/849542/9999999997-06-013694.txt
10   NA                                                    Complete submission text file   9999999997-06-013694.txt              1870  edgar/data/849542/9999999997-06-013694.txt
11    1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-024352.paper X-17A-5/A     296  edgar/data/354497/9999999997-06-024352.txt
12   NA                                                           Scanned paper document                scanned.pdf            108794  edgar/data/354497/9999999997-06-024352.txt
13   NA                                                    Complete submission text file   9999999997-06-024352.txt              1980  edgar/data/354497/9999999997-06-024352.txt
14    1                                                                        FORM 10-Q            d34692e10vq.htm      10-Q  436342 edgar/data/1095315/0000950134-06-010004.txt
15    2                FOURTH AMENDED AND RESTATED NOTES PAYABLE SUBORDINATION AGREEMENT          d34692exv10w1.htm   EX-10.1   22673 edgar/data/1095315/0000950134-06-010004.txt
16    3                                 AMENDMENT 7 TO AGREEMENT FOR INVENTORY FINANCING          d34692exv10w2.htm   EX-10.2   57296 edgar/data/1095315/0000950134-06-010004.txt
17    4                      AMENDMENT 6 TO AMENDED AND RESTATED PLATINUM PLAN AGREEMENT          d34692exv10w3.htm   EX-10.3   89454 edgar/data/1095315/0000950134-06-010004.txt
18    5                                                                        AGREEMENT          d34692exv10w4.htm   EX-10.4   59473 edgar/data/1095315/0000950134-06-010004.txt
19    6                                  SECOND AMENDMENT TO LOAN AND SECURITY AGREEMENT          d34692exv10w5.htm   EX-10.5   30185 edgar/data/1095315/0000950134-06-010004.txt
20    7                                       AMENDMENT 4 TO LOAN AND SECURITY AGREEMENT          d34692exv10w6.htm   EX-10.6   40034 edgar/data/1095315/0000950134-06-010004.txt
21    8                                                                         GUARANTY          d34692exv10w7.htm   EX-10.7   38308 edgar/data/1095315/0000950134-06-010004.txt
22    9       SECOND AMENDMENT TO FIRST AMENDED AND RESTATED LOAN AND SECURITY AGREEMENT          d34692exv10w8.htm   EX-10.8   24409 edgar/data/1095315/0000950134-06-010004.txt
23   10                                     CERTIFICATION OF CEO PURSUANT TO SECTION 302          d34692exv31w1.htm   EX-31.1    6920 edgar/data/1095315/0000950134-06-010004.txt
24   11                                     CERTIFICATION OF CFO PURSUANT TO SECTION 302          d34692exv31w2.htm   EX-31.2    6407 edgar/data/1095315/0000950134-06-010004.txt
25   12                              CERTIFICATIONS OF CEO & CFO PURSUANT TO SECTION 906          d34692exv32w1.htm   EX-32.1    4014 edgar/data/1095315/0000950134-06-010004.txt
26   NA                                                    Complete submission text file   0000950134-06-010004.txt            816901 edgar/data/1095315/0000950134-06-010004.txt
27    1                                                      PRELIMINARY PROXY STATEMENT          d35110ppre14a.htm   PRE 14A  165604 edgar/data/1095315/0000950134-06-007451.txt
28    2                                                                          GRAPHIC        d35110pd3511001.gif   GRAPHIC    3049 edgar/data/1095315/0000950134-06-007451.txt
29    3                                                                          GRAPHIC        d35110pd3511002.gif   GRAPHIC    2942 edgar/data/1095315/0000950134-06-007451.txt
30    4                                                                          GRAPHIC        d35110pd3511003.gif   GRAPHIC   11006 edgar/data/1095315/0000950134-06-007451.txt
31   NA                                                    Complete submission text file   0000950134-06-007451.txt            190874 edgar/data/1095315/0000950134-06-007451.txt
32    1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-026184.paper    REGDEX     293 edgar/data/1095315/9999999997-06-026184.txt
33   NA                                                           Scanned paper document                scanned.pdf            431768 edgar/data/1095315/9999999997-06-026184.txt
34   NA                                                    Complete submission text file   9999999997-06-026184.txt              1703 edgar/data/1095315/9999999997-06-026184.txt
35    1                                                           CORP Q1 2006 FORM 10-Q              pge10q_q1.htm      10-Q  794817 edgar/data/1004980/0001004980-06-000126.txt
36    2                                                                       EXHIBIT 10              q106_ex10.htm     EX-10  174779 edgar/data/1004980/0001004980-06-000126.txt
37    3                                                                       EXHIBIT 11              q106_ex11.htm     EX-11   29829 edgar/data/1004980/0001004980-06-000126.txt
38    4                                                                     EXHIBIT 12.1            q106_ex12-1.htm   EX-12.1   18293 edgar/data/1004980/0001004980-06-000126.txt
39    5                                                                     EXHIBIT 12.2            q106_ex12-2.htm   EX-12.2   25570 edgar/data/1004980/0001004980-06-000126.txt
40    6                                           CORP CEO/CFO SECTION 302 CERTIFICATION            q106_ex31-1.htm   EX-31.1    9904 edgar/data/1004980/0001004980-06-000126.txt
41    7                                        UTILITY CEO/CFO SECTION 302 CERTIFICATION            q106_ex31-2.htm   EX-31.2    9869 edgar/data/1004980/0001004980-06-000126.txt
42    8                                           CORP CEO/CFO SECTION 906 CERTIFICATION q106_ex32-1corp906cert.htm   EX-32.1    5037 edgar/data/1004980/0001004980-06-000126.txt
43    9                                        UTILITY CEO/CFO SECTION 906 CERTIFICATION     q106_ex32-2906cert.htm   EX-32.2    4774 edgar/data/1004980/0001004980-06-000126.txt
44   NA                                                    Complete submission text file   0001004980-06-000126.txt           1074549 edgar/data/1004980/0001004980-06-000126.txt
45    1                                                       PG&E CORPORATION 11-K 2005          corprsp11k_05.htm      11-K    4049 edgar/data/1004980/0001004980-06-000157.txt
46    2                             CORP MANAGEMENT & UNION RETIREMENT SAVINGS PLAN 2005  pgecorprspfsmerged_05.htm   EX-99.1  158053 edgar/data/1004980/0001004980-06-000157.txt
47    3                                             INDEPENDENT AUDITOR'S CONSENT LETTER      corprspconsent_05.htm   EX-99.2    1020 edgar/data/1004980/0001004980-06-000157.txt
48   NA                                                    Complete submission text file   0001004980-06-000157.txt            164664 edgar/data/1004980/0001004980-06-000157.txt
49    1                                                                                               filename1.htm   CORRESP   26848 edgar/data/1004980/0001004980-06-000139.txt
50   NA                                                    Complete submission text file   0001004980-06-000139.txt             28271 edgar/data/1004980/0001004980-06-000139.txt
51    1                                                                                               filename1.htm   CORRESP   46822 edgar/data/1004980/0001004980-06-000159.txt
52   NA                                                    Complete submission text file   0001004980-06-000159.txt             48241 edgar/data/1004980/0001004980-06-000159.txt
53    1                                                                                               filename1.txt    LETTER    6720 edgar/data/1004980/0000000000-06-018834.txt
54   NA                                                    Complete submission text file   0000000000-06-018834.txt              8200 edgar/data/1004980/0000000000-06-018834.txt
55    1                                                                                               filename1.pdf    LETTER   22516 edgar/data/1004980/0000000000-06-026668.txt
56   NA                                                    Complete submission text file   0000000000-06-026668.txt             32636 edgar/data/1004980/0000000000-06-026668.txt
57    1                                             PG&E ENERGY RECOVERY FUNDING LLC 10D                perf10d.htm      10-D   16759 edgar/data/1305629/0001305629-06-000014.txt
58    2                                    PG&E ENERGY RECOVERY FUNDING LLC EXHIBIT 99.1              q2erb1cer.htm     EX-99  151195 edgar/data/1305629/0001305629-06-000014.txt
59    3                                    PG&E ENERGY RECOVERY FUNDING LLC EXHIBIT 99.2              q2erb2cer.htm     EX-99  149464 edgar/data/1305629/0001305629-06-000014.txt
60   NA                                                    Complete submission text file   0001305629-06-000014.txt            318834 edgar/data/1305629/0001305629-06-000014.txt
61    1                                                        PG&E FUNDING LLC FORM 10Q      funding_form10qv2.htm      10-Q   96055 edgar/data/1041637/0001041637-06-000008.txt
62    2                                                      PG&E FUNDING LLC EXHIBIT 31           exhibit_31v2.htm     EX-31   17886 edgar/data/1041637/0001041637-06-000008.txt
63    3                                                      PG&E FUNDING LLC EXHIBIT 32           exhibit_32v2.htm     EX-32    3340 edgar/data/1041637/0001041637-06-000008.txt
64    4                                                      PG&E FUNDING LLC EXHIBIT 99           exhibit_99v2.htm     EX-99  104359 edgar/data/1041637/0001041637-06-000008.txt
65   NA                                                    Complete submission text file   0001041637-06-000008.txt            223062 edgar/data/1041637/0001041637-06-000008.txt
66    1                                                                                                  pgi10q.txt     10QSB   46105   edgar/data/81157/0001068800-06-000409.txt
67    2                                                                                                  ex31p1.txt   EX-31.1    3598   edgar/data/81157/0001068800-06-000409.txt
68    3                                                                                                  ex31p2.txt   EX-31.2    3598   edgar/data/81157/0001068800-06-000409.txt
69    4                                                                                                  ex32p1.txt   EX-32.1    1370   edgar/data/81157/0001068800-06-000409.txt
70    5                                                                                                  ex32p2.txt   EX-32.2    1370   edgar/data/81157/0001068800-06-000409.txt
71   NA                                                    Complete submission text file   0001068800-06-000409.txt             57552   edgar/data/81157/0001068800-06-000409.txt
72    1                                                    AUTO-GENERATED PAPER DOCUMENT 9999999997-06-024770.paper    REGDEX     293 edgar/data/1283956/9999999997-06-024770.txt
73   NA                                                    Complete submission text file   9999999997-06-024770.txt              1426 edgar/data/1283956/9999999997-06-024770.txt
74    1                                QUARTERLY REPORT PURSUANT TO SECTIONS 13 OR 15(D)       a06-12022_210qsb.htm     10QSB  635374 edgar/data/1127005/0001104659-06-036613.txt
75    2                                                                            EX-31      a06-12022_2ex31d1.htm   EX-31.1   13644 edgar/data/1127005/0001104659-06-036613.txt
76    3                                                                            EX-31      a06-12022_2ex31d2.htm   EX-31.2   13938 edgar/data/1127005/0001104659-06-036613.txt
77    4                                                                            EX-32        a06-12022_2ex32.htm     EX-32    8647 edgar/data/1127005/0001104659-06-036613.txt
78   NA                                                    Complete submission text file   0001104659-06-036613.txt            673324 edgar/data/1127005/0001104659-06-036613.txt
79    1                                   NOTICE OF INABILITY TO TIMELY FILE A FORM 10-Q       a06-12022_1nt10q.htm   NT 10-Q   61807 edgar/data/1127005/0001104659-06-035407.txt
80   NA                                                    Complete submission text file   0001104659-06-035407.txt             63562 edgar/data/1127005/0001104659-06-035407.txt
81    1                                                 FILED PURSUANT TO RULE 424(B)(4)         y18025b4e424b4.htm     424B4 1412125 edgar/data/1354327/0000950123-06-008297.txt
82    2                                                                          GRAPHIC       y18025b4y1802503.gif   GRAPHIC    1806 edgar/data/1354327/0000950123-06-008297.txt
83    3                                                                          GRAPHIC       y18025b4y1802504.gif   GRAPHIC  232981 edgar/data/1354327/0000950123-06-008297.txt
84    4                                                                          GRAPHIC       y18025b4y1802501.gif   GRAPHIC    5984 edgar/data/1354327/0000950123-06-008297.txt
85    5                                                                          GRAPHIC       y18025b4y1802502.gif   GRAPHIC    5768 edgar/data/1354327/0000950123-06-008297.txt
86    6                                                                          GRAPHIC       y18025b4y1802506.gif   GRAPHIC  255413 edgar/data/1354327/0000950123-06-008297.txt
87   NA                                                    Complete submission text file   0000950123-06-008297.txt           2105806 edgar/data/1354327/0000950123-06-008297.txt
88    1                                                                      FORM 8-A12G          y22393e8va12g.txt    8-A12G    5695 edgar/data/1354327/0000950123-06-007803.txt
89   NA                                                    Complete submission text file   0000950123-06-007803.txt              7040 edgar/data/1354327/0000950123-06-007803.txt
90    1                                                                                               filename1.htm   CORRESP    4051 edgar/data/1354327/0000950123-06-008050.txt
91   NA                                                    Complete submission text file   0000950123-06-008050.txt              5296 edgar/data/1354327/0000950123-06-008050.txt
92    1                                                                                               filename1.txt   CORRESP    2399 edgar/data/1354327/0000950123-06-008055.txt
93   NA                                                    Complete submission text file   0000950123-06-008055.txt              3646 edgar/data/1354327/0000950123-06-008055.txt
94    1                                                                                               filename1.txt   CORRESP    2108 edgar/data/1354327/0000950123-06-008104.txt
95   NA                                                    Complete submission text file   0000950123-06-008104.txt              3355 edgar/data/1354327/0000950123-06-008104.txt
96    1                                                                                            primary_doc.html    EFFECT      NA edgar/data/1354327/9999999995-06-000500.txt
97    1                                                                                             primary_doc.xml    EFFECT     505 edgar/data/1354327/9999999995-06-000500.txt
98   NA                                                    Complete submission text file   9999999995-06-000500.txt              1965 edgar/data/1354327/9999999995-06-000500.txt
99    1                                                                         FORM FWP            y18025fwfwp.htm       FWP    9011 edgar/data/1354327/0000950123-06-008214.txt
100  NA                                                    Complete submission text file   0000950123-06-008214.txt             10886 edgar/data/1354327/0000950123-06-008214.txt
101   1                                                      AMENDMENT NO. 1 TO FORM S-1          y18025a1sv1za.htm     S-1/A 1285240 edgar/data/1354327/0000950123-06-004939.txt
102   2                            EX-10.1: SECOND AMENDED AND RESTATED CREDIT AGREEMENT        y18025a1exv10w1.txt   EX-10.1  493595 edgar/data/1354327/0000950123-06-004939.txt
103   3                                            EX-10.2: SECOND LIEN CREDIT AGREEMENT        y18025a1exv10w2.txt   EX-10.2  412122 edgar/data/1354327/0000950123-06-004939.txt
104   4                      EX-10.3: AMENDED AND RESTATED PLEDGE AND SECURITY AGREEMENT        y18025a1exv10w3.txt   EX-10.3  155317 edgar/data/1354327/0000950123-06-004939.txt
105   5                               EX-10.4: SECOND LIEN PLEDGE AND SECURITY AGREEMENT        y18025a1exv10w4.txt   EX-10.4  152423 edgar/data/1354327/0000950123-06-004939.txt
106   6                                               EX-10.5: 2004 STOCK INCENTIVE PLAN        y18025a1exv10w5.txt   EX-10.5   14245 edgar/data/1354327/0000950123-06-004939.txt
107   7                EX-10.6: FORM OF 2004 STOCK INCENTIVE PLAN STOCK OPTION AGREEMENT        y18025a1exv10w6.txt   EX-10.6   22438 edgar/data/1354327/0000950123-06-004939.txt
108   8                                                   EX-10.10: EMPLOYMENT AGREEMENT       y18025a1exv10w10.txt  EX-10.10   40283 edgar/data/1354327/0000950123-06-004939.txt
109   9                                                   EX-10.11: EMPLOYMENT AGREEMENT       y18025a1exv10w11.txt  EX-10.11   40224 edgar/data/1354327/0000950123-06-004939.txt
110  10                                                   EX-10.12: EMPLOYMENT AGREEMENT       y18025a1exv10w12.txt  EX-10.12   41666 edgar/data/1354327/0000950123-06-004939.txt
111  11                                                   EX-10.13: EMPLOYMENT AGREEMENT       y18025a1exv10w13.txt  EX-10.13   41341 edgar/data/1354327/0000950123-06-004939.txt
112  12                                                   EX-10.14: EMPLOYMENT AGREEMENT       y18025a1exv10w14.txt  EX-10.14   41353 edgar/data/1354327/0000950123-06-004939.txt
113  13                                                   EX-10.15: EMPLOYMENT AGREEMENT       y18025a1exv10w15.txt  EX-10.15   40671 edgar/data/1354327/0000950123-06-004939.txt
114  14                                                   EX-10.16: EMPLOYMENT AGREEMENT       y18025a1exv10w16.txt  EX-10.16   41307 edgar/data/1354327/0000950123-06-004939.txt
115  15                                EX-10.18: FORM OF ROLLOVER STOCK OPTION AGREEMENT       y18025a1exv10w18.txt  EX-10.18   17263 edgar/data/1354327/0000950123-06-004939.txt
116  16                                            EX-23.1: CONSENT OF ERNST & YOUNG LLP        y18025a1exv23w1.txt   EX-23.1     768 edgar/data/1354327/0000950123-06-004939.txt
117  19                                                                          GRAPHIC       y18025a1y1802503.gif   GRAPHIC    3527 edgar/data/1354327/0000950123-06-004939.txt
118  20                                                                          GRAPHIC       y18025a1y1802501.gif   GRAPHIC    5979 edgar/data/1354327/0000950123-06-004939.txt
119  21                                                                          GRAPHIC       y18025a1y1802502.gif   GRAPHIC    6237 edgar/data/1354327/0000950123-06-004939.txt
120  22                                                                                              filename22.txt   CORRESP   96895 edgar/data/1354327/0000950123-06-004939.txt
121  23                                                                                              filename23.htm   CORRESP    3513 edgar/data/1354327/0000950123-06-004939.txt
122  NA                                                    Complete submission text file   0000950123-06-004939.txt           2964279 edgar/data/1354327/0000950123-06-004939.txt
123   1                                                      AMENDMENT NO. 2 TO FORM S-1          y18025a2sv1za.htm     S-1/A 1491491 edgar/data/1354327/0000950123-06-006981.txt
124   2                                             EX-4.1: FORM OF SPECIMEN CERTIFICATE         y18025a2exv4w1.htm    EX-4.1   13785 edgar/data/1354327/0000950123-06-006981.txt
125   3                                                   EX-10.19: EMPLOYMENT AGREEMENT       y18025a2exv10w19.txt  EX-10.19   39617 edgar/data/1354327/0000950123-06-006981.txt
126   4                                            EX-23.1: CONSENT OF ERNST & YOUNG LLP        y18025a2exv23w1.htm   EX-23.1    1311 edgar/data/1354327/0000950123-06-006981.txt
127   7                                                                          GRAPHIC       y18025a2y1802503.gif   GRAPHIC     597 edgar/data/1354327/0000950123-06-006981.txt
128   8                                                                          GRAPHIC       y18025a2y1802501.gif   GRAPHIC    5979 edgar/data/1354327/0000950123-06-006981.txt
129   9                                                                          GRAPHIC       y18025a2y1802502.gif   GRAPHIC    6237 edgar/data/1354327/0000950123-06-006981.txt
130  10                                                                          GRAPHIC       y18025a2e1802501.gif   GRAPHIC   86540 edgar/data/1354327/0000950123-06-006981.txt
131  11                                                                                              filename11.htm     COVER    4125 edgar/data/1354327/0000950123-06-006981.txt
132  12                                                                                              filename12.htm   CORRESP   86863 edgar/data/1354327/0000950123-06-006981.txt
133  NA                                                    Complete submission text file   0000950123-06-006981.txt           1776081 edgar/data/1354327/0000950123-06-006981.txt
134   1                                                      AMENDMENT NO. 3 TO FORM S-1          y18025a3sv1za.htm     S-1/A 1500984 edgar/data/1354327/0000950123-06-007472.txt
135   2                                           EX-1.1: FORM OF UNDERWRITING AGREEMENT         y18025a3exv1w1.txt    EX-1.1  114830 edgar/data/1354327/0000950123-06-007472.txt
136   3                EX-3.1: FORM OF AMENDED AND RESTATED CERTIFICATE OF INCORPORATION         y18025a3exv3w1.txt    EX-3.1   26315 edgar/data/1354327/0000950123-06-007472.txt
137   4                                     EX-3.2: FORM OF AMENDED AND RESTATED BY-LAWS         y18025a3exv3w2.txt    EX-3.2   85311 edgar/data/1354327/0000950123-06-007472.txt
138   5                 EX-4.2: FORM OF AMENDED AND RESTATED SECURITY HOLDERS' AGREEMENT         y18025a3exv4w2.txt    EX-4.2   72916 edgar/data/1354327/0000950123-06-007472.txt
139   6                            EX-10.7: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN        y18025a3exv10w7.txt   EX-10.7   46191 edgar/data/1354327/0000950123-06-007472.txt
140   7 EX-10.8: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN NON-QUALIFIED STOCK OPTION        y18025a3exv10w8.txt   EX-10.8   11945 edgar/data/1354327/0000950123-06-007472.txt
141   8                                       EX-10.9: EMPLOYMENT AGREEMENT: HERSHBERGER        y18025a3exv10w9.txt   EX-10.9   41163 edgar/data/1354327/0000950123-06-007472.txt
142   9                             EX-10.17: FORM OF DIRECTOR INDEMNIFICATION AGREEMENT       y18025a3exv10w17.txt  EX-10.17   31723 edgar/data/1354327/0000950123-06-007472.txt
143  10                                                       EX-10.20: SUPPLY AGREEMENT       y18025a3exv10w20.txt  EX-10.20   13830 edgar/data/1354327/0000950123-06-007472.txt
144  11                                                     EX-10.21: SUPPLIER AGREEMENT       y18025a3exv10w21.txt  EX-10.21   67130 edgar/data/1354327/0000950123-06-007472.txt
145  12                                                     EX-10.22: SUPPLIER AGREEMENT       y18025a3exv10w22.txt  EX-10.22   69431 edgar/data/1354327/0000950123-06-007472.txt
146  13                       EX-10.23: FORM OF PGT, INC. 2006 MANAGEMENT INCENTIVE PLAN       y18025a3exv10w23.txt  EX-10.23   13870 edgar/data/1354327/0000950123-06-007472.txt
147  14 EX-10.24: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN RESTRICTED STOCK AWARD AG       y18025a3exv10w24.txt  EX-10.24   10950 edgar/data/1354327/0000950123-06-007472.txt
148  15 EX-10.25: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN RESTIRCTED STOCK UNIT AWA       y18025a3exv10w25.txt  EX-10.25   10593 edgar/data/1354327/0000950123-06-007472.txt
149  16 EX-10.26: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN INCENTIVE STOCK OPTION AG       y18025a3exv10w26.txt  EX-10.26   12386 edgar/data/1354327/0000950123-06-007472.txt
150  17                                            EX-23.1: CONSENT OF ERNST & YOUNG LLP        y18025a3exv23w1.txt   EX-23.1     817 edgar/data/1354327/0000950123-06-007472.txt
151  18                                             EX-99.1: CONSENT OF DIRECTOR NOMINEE        y18025a3exv99w1.txt   EX-99.1    1068 edgar/data/1354327/0000950123-06-007472.txt
152  27                                                                          GRAPHIC       y18025a3y1802503.gif   GRAPHIC    1806 edgar/data/1354327/0000950123-06-007472.txt
153  28                                                                          GRAPHIC       y18025a3y1802504.gif   GRAPHIC  232981 edgar/data/1354327/0000950123-06-007472.txt
154  29                                                                          GRAPHIC       y18025a3y1802501.gif   GRAPHIC    5979 edgar/data/1354327/0000950123-06-007472.txt
155  30                                                                          GRAPHIC       y18025a3y1802502.gif   GRAPHIC    6237 edgar/data/1354327/0000950123-06-007472.txt
156  31                                                                          GRAPHIC       y18025a3y1802506.gif   GRAPHIC  255413 edgar/data/1354327/0000950123-06-007472.txt
157  32                                                                                              filename32.htm   CORRESP    4679 edgar/data/1354327/0000950123-06-007472.txt
158  33                                                                                              filename33.htm   CORRESP    6853 edgar/data/1354327/0000950123-06-007472.txt
159  34                                                                                              filename34.htm   CORRESP   10944 edgar/data/1354327/0000950123-06-007472.txt
160  35                                                                                              filename35.txt   CORRESP    1480 edgar/data/1354327/0000950123-06-007472.txt
161  36                                                                                              filename36.htm   CORRESP    4113 edgar/data/1354327/0000950123-06-007472.txt
162  37                                                                                              filename37.htm   CORRESP   13254 edgar/data/1354327/0000950123-06-007472.txt
163  38                                                                                              filename38.htm   CORRESP   31570 edgar/data/1354327/0000950123-06-007472.txt
164  39                                                                                              filename39.htm   CORRESP   17401 edgar/data/1354327/0000950123-06-007472.txt
165  NA                                                    Complete submission text file   0000950123-06-007472.txt           2916401 edgar/data/1354327/0000950123-06-007472.txt
166   1                                                      AMENDMENT NO. 4 TO FORM S-1          y18025a4sv1za.htm     S-1/A 1478836 edgar/data/1354327/0000950123-06-007562.txt
 [ reached 'max' / getOption("max.print") -- omitted 4623 rows ]
> temp[[1, 2, 3, 4, 5]]
Error in temp[[1, 2, 3, 4, 5]] : incorrect number of subscripts
> temp[c(1, 2, 3, 4, 5)]
[[1]]
  seq                   description                   document     type   size                                   file_name
1   1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-048085.paper REGDEX/A    295 edgar/data/1132469/9999999997-06-048085.txt
2  NA        Scanned paper document                scanned.pdf          243893 edgar/data/1132469/9999999997-06-048085.txt
3  NA Complete submission text file   9999999997-06-048085.txt            1459 edgar/data/1132469/9999999997-06-048085.txt

[[2]]
  seq                   description                   document   type size                                 file_name
1   1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-038939.paper FOCUSN  293 edgar/data/78017/9999999997-06-038939.txt
2  NA Complete submission text file   9999999997-06-038939.txt        1847 edgar/data/78017/9999999997-06-038939.txt

[[3]]
  seq                   description                   document    type   size                                 file_name
1   1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-028915.paper X-17A-5    294 edgar/data/78017/9999999997-06-028915.txt
2  NA        Scanned paper document                scanned.pdf         393873 edgar/data/78017/9999999997-06-028915.txt
3  NA Complete submission text file   9999999997-06-028915.txt           1850 edgar/data/78017/9999999997-06-028915.txt

[[4]]
  seq                   description                   document   type size                                  file_name
1   1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-013694.paper TA-1/A  293 edgar/data/849542/9999999997-06-013694.txt
2  NA Complete submission text file   9999999997-06-013694.txt        1870 edgar/data/849542/9999999997-06-013694.txt

[[5]]
  seq                   description                   document      type   size                                  file_name
1   1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-024352.paper X-17A-5/A    296 edgar/data/354497/9999999997-06-024352.txt
2  NA        Scanned paper document                scanned.pdf           108794 edgar/data/354497/9999999997-06-024352.txt
3  NA Complete submission text file   9999999997-06-024352.txt             1980 edgar/data/354497/9999999997-06-024352.txt

showing that the code works again if you replace get_filing_docs to filing_docs_df in line with the mclapply (or alternatively you could rename my function get_filing_docs, if that is preferable).

iangow commented 5 years ago

@bdcallen it doesn't work for me. Would you mind checking that you can run source('~/git/edgar/filing_docs/scrape_filing_docs.R') after modifying the code?

iangow commented 5 years ago

@bdcallen Never mind. Typo. I've fixed it now.

iangow commented 5 years ago

Code seems to be running OK now (I incorporate error-handling from my version).