Closed iangow closed 5 years ago
My guess is that we want to get all the HTML tables.
dfs <-
table_nodes %>%
html_table()
Then check each element of dfs
to make sure it is of the correct form before passing it to bind_rows
, etc.
@iangow It turns out there is an easy way to check if the elements are of the correct form. Each of the table
nodes has an attribute called class
, which is equal to tableFile
if it is a table of the form with the filing documents
> head_url <- 'https://www.sec.gov/Archives/edgar/data/320193/000119312509214859/0001193125-09-214859-index.htm'
> table_nodes <-
+ read_html(head_url, encoding="Latin1") %>%
+ html_nodes("table")
> table_nodes
{xml_nodeset (2)}
[1] <table class="tableFile" summary="Document Format Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n <th scope="col" s ...
[2] <table class="tableFile" summary="Data Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n <th scope="col" style="width ...
> which(table_nodes %>% html_attr("class") == "tableFile")
[1] 1 2
> filing_doc_table_indices <- which(table_nodes %>% html_attr("class") == "tableFile")
> filing_doc_table_indices
[1] 1 2
> table_nodes[filing_doc_table_indices]
{xml_nodeset (2)}
[1] <table class="tableFile" summary="Document Format Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n <th scope="col" s ...
[2] <table class="tableFile" summary="Data Files">\n<tr>\n<th scope="col" style="width: 5%;"><acronym title="Sequence Number">Seq</acronym></th>\n <th scope="col" style="width ...
There is also another useful attribute called summary
which contains the name of the table, ie. Document Format Files
etc...
@iangow Using a new function filing_docs_df
, defined as the part of get_filing_docs
which turns the relevant tables into a dataframe to be written to `edgar.filing_docs
filing_docs_df <- function(file_name) {
head_url <- get_index_url(file_name)
table_nodes <-
read_html(head_url, encoding="Latin1") %>%
html_nodes("table")
filing_doc_table_indices <- which(table_nodes %>% html_attr("class") == "tableFile")
file_tables <- table_nodes[filing_doc_table_indices]
if (length(file_tables) < 1) {
df <- tibble(seq = NA, description = NA, document = NA, type = NA,
size = NA, file_name = file_name)
} else {
df <- file_tables %>% html_table() %>% bind_rows() %>% fix_names() %>% mutate(file_name = file_name, type = as.character(type))
colnames(df) <- tolower(colnames(df))
}
return(df)
}
I did the following
> f2 <- 'edgar/data/1046404/0000871839-18-000061-index.txt'
> get_index_url(f2)
[1] "https://www.sec.gov/Archives/edgar/data/1046404/000087183918000061/0000871839-18-000061-index.htm"
> filing_docs_df(f2)
seq description document type size file_name
1 1 proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
2 2 GRAPHIC img_3fb1dcc13ad04.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
3 3 GRAPHIC img_7c4a99f133244.jpg GRAPHIC 50688 edgar/data/1046404/0000871839-18-000061-index.txt
4 4 GRAPHIC img_39af8f5852b44.jpg GRAPHIC 41924 edgar/data/1046404/0000871839-18-000061-index.txt
5 5 GRAPHIC img_58e800b8f91b4.jpg GRAPHIC 46154 edgar/data/1046404/0000871839-18-000061-index.txt
6 6 GRAPHIC img_61e4e19e30d84.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
7 7 GRAPHIC img_742ca67764644.jpg GRAPHIC 1952 edgar/data/1046404/0000871839-18-000061-index.txt
8 8 GRAPHIC img_18245b4143de4.jpg GRAPHIC 1957 edgar/data/1046404/0000871839-18-000061-index.txt
9 9 GRAPHIC img_204353f228fd4.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
10 10 GRAPHIC img_286584bbcac34.jpg GRAPHIC 48487 edgar/data/1046404/0000871839-18-000061-index.txt
11 11 GRAPHIC img_a17ec2d0123f4.jpg GRAPHIC 3622 edgar/data/1046404/0000871839-18-000061-index.txt
12 12 GRAPHIC img_bda416a65f094.jpg GRAPHIC 63014 edgar/data/1046404/0000871839-18-000061-index.txt
13 13 GRAPHIC img_c147f393a5fe4.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
14 14 GRAPHIC img_ddb30f382a384.jpg GRAPHIC 4610 edgar/data/1046404/0000871839-18-000061-index.txt
15 15 GRAPHIC img_e82bb2bb0af14.jpg GRAPHIC 1771 edgar/data/1046404/0000871839-18-000061-index.txt
16 16 GRAPHIC img_e89a234d074c4.jpg GRAPHIC 39813 edgar/data/1046404/0000871839-18-000061-index.txt
17 17 GRAPHIC img_ea1cfc908cbf4.jpg GRAPHIC 4606 edgar/data/1046404/0000871839-18-000061-index.txt
18 18 GRAPHIC img_ea5f423baea34.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
19 19 GRAPHIC img_ed7f775d2ba74.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
20 NA Complete submission text file 0000871839-18-000061.txt 985260 edgar/data/1046404/0000871839-18-000061-index.txt
which shows that this function handled the problematic case you put in the original post.
@iangow Here's the outcome of filing_docs_df
acting on the second filing that you put in the original post, the one with two tables with filing documents
> file_name <- 'edgar/data/320193/0001193125-09-214859.txt'
> get_index_url(file_name)
[1] "https://www.sec.gov/Archives/edgar/data/320193/000119312509214859/0001193125-09-214859-index.htm"
> filing_docs_df(file_name)
seq description document type size file_name
1 1 FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009 d10k.htm 10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2 2 SUBSIDIARIES OF THE REGISTRANT dex211.htm EX-21.1 2792 edgar/data/320193/0001193125-09-214859.txt
3 3 CONSENT OF ERNST & YOUNG LLP dex231.htm EX-23.1 1634 edgar/data/320193/0001193125-09-214859.txt
4 4 CONSENT OF KPMG LLP dex232.htm EX-23.2 2390 edgar/data/320193/0001193125-09-214859.txt
5 5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER dex311.htm EX-31.1 9851 edgar/data/320193/0001193125-09-214859.txt
6 6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER dex312.htm EX-31.2 10112 edgar/data/320193/0001193125-09-214859.txt
7 7 SECTION 1350 CERTIFICATIONS OF CEO AND CFO dex321.htm EX-32.1 5354 edgar/data/320193/0001193125-09-214859.txt
8 14 GRAPHIC g91485g21p46.jpg GRAPHIC 53857 edgar/data/320193/0001193125-09-214859.txt
9 NA Complete submission text file 0001193125-09-214859.txt 3638340 edgar/data/320193/0001193125-09-214859.txt
10 8 XBRL INSTANCE DOCUMENT aapl-20090926.xml EX-101.INS 760344 edgar/data/320193/0001193125-09-214859.txt
11 9 XBRL TAXONOMY EXTENSION SCHEMA aapl-20090926.xsd EX-101.SCH 13066 edgar/data/320193/0001193125-09-214859.txt
12 10 XBRL TAXONOMY EXTENSION CALCULATION LINKBASE aapl-20090926_cal.xml EX-101.CAL 30955 edgar/data/320193/0001193125-09-214859.txt
13 11 XBRL TAXONOMY EXTENSION DEFINITION LINKBASE aapl-20090926_def.xml EX-101.DEF 19450 edgar/data/320193/0001193125-09-214859.txt
14 12 XBRL TAXONOMY EXTENSION LABEL LINKBASE aapl-20090926_lab.xml EX-101.LAB 100641 edgar/data/320193/0001193125-09-214859.txt
15 13 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE aapl-20090926_pre.xml EX-101.PRE 80647 edgar/data/320193/0001193125-09-214859.txt
As you can see by comparing to the linked page for the filing, all the files from the Document Format Files
table and those of the Data Files
table are joined in the one dataframe.
OK. Perfect. So get_filing_docs.R
should just use this alternative function. Are you able to test and commit a version that does this?
There's probably a need to go back and try to identify the filings that have been "done wrong"; we should create a separate issue for this (I think the best approach might involve looking for breaks in seq
).
Sorry. That should be scrape_filing_docs.R
. I have stopped the code I have been running on 10.101.13.99
so that you are able to run it without interference. Note that I simply run source('filing_docs/scrape_filing_docs.R', echo=TRUE)
from the edgar
project in RStudio.
@iangow In my latest commit, I just committed filing_docs_df
and changed get_filing_docs
to
get_filing_docs <- function(file_name) {
try({
df <- filing_docs_df(file_name)
pg <- dbConnect(PostgreSQL())
dbWriteTable(pg, c("edgar", "filing_docs"),
df, append = TRUE, row.names = FALSE)
dbDisconnect(pg)
return(TRUE)}, {return(FALSE)})
}
inside the file get_filing_doc_functions.R
. Splitting the functions this way makes it convenient for testing, since we can use filing_docs_df
to see if the dataframe is of the right form/successfully scraped, and change what we need to here, and use get_filing_docs
to write.
I agree, catching breaks in seq
is the way to go with getting the filings which have been wrongly handled, especially now since my code combines the tables with filing documents, over which we know seq
runs, so these types of errors should now be much more rare (when we run this code over the whole set of filings).
OK. I pulled the latest code and I'm running it now ... no need for you to do so. I will let you know if any issues crop up. (I am seeing the code [before your commit] "hanging" at times. Not sure why; I'm just terminating it and running it again. I believe the code may grab filings somewhat at random, so it may be a small number of filings causing an issue.)
@iangow I just noticed get_filing_docs
had to be changed in scrape_filing_docs.R
. As the functions get_index_url
, fix_names
, and the updated get_filing_docs
are in get_filing_doc_functions.R
, I've done this in my latest commit using the source
function. So if you now pull again and restart, scrape_filing_docs.R
will use the updated get_filing_docs
.
Here's some code for that seq
issue once you have it set up:
library(DBI)
library(dplyr, warn.conflicts = FALSE)
pg <- dbConnect(RPostgreSQL::PostgreSQL()) # bigint = "integer")
rs <- dbExecute(pg, "SET search_path TO edgar, public")
rs <- dbExecute(pg, "SET work_mem = '2GB'")
filings <- tbl(pg, "filings")
filing_docs <- tbl(pg, "filing_docs")
problems <-
filing_docs %>%
filter(!is.na(seq)) %>%
group_by(file_name) %>%
summarize(seqs = array_agg(seq),
seq_max = max(seq, na.rm = TRUE)) %>%
mutate(seq_len = array_length(seqs, 1L)) %>%
filter(seq_max != seq_len) %>%
ungroup()
problems
#> # Source: lazy query [?? x 4]
#> # Database: postgres 9.6.11 [igow@10.101.13.99:5432/crsp]
#> file_name seqs seq_max seq_len
#> <chr> <chr> <int> <int>
#> 1 edgar/data/1000015/0000912057-00-0… {1,2,3,4,5,6,1,2,3,… 6 12
#> 2 edgar/data/1000015/0000912057-01-5… {1,3} 3 2
#> 3 edgar/data/1000015/0000912057-02-0… {1,3} 3 2
#> 4 edgar/data/1000015/0000912057-02-0… {1,3,4,5,6,7,8,9,10… 12 11
#> 5 edgar/data/1000015/0000912057-02-0… {1,3} 3 2
#> 6 edgar/data/1000015/0000912057-02-0… {1,3} 3 2
#> 7 edgar/data/1000015/0001005477-02-0… {1,3} 3 2
#> 8 edgar/data/1000015/0001047469-02-0… {1,3} 3 2
#> 9 edgar/data/1000015/0001047469-03-0… {1,3,4,5,6,7,8,9,10… 11 10
#> 10 edgar/data/1000015/0001047469-03-0… {1,3} 3 2
#> # … with more rows
Created on 2019-01-22 by the reprex package (v0.2.1)
@iangow Some of these may be duplicates. I just checked the first filing in problems, and its ok if seq runs just from 1 to 6 as there is just one table, but from seqs
we can see that there are two copies of each integer.
I noticed that. Let's make that other issue and we can address this issue there.
@iangow There are also some odd entries. The second filing in problems has only one table, and there is no seq = 2
entry.
Again, let's put the discussion in a new issue. Otherwise this will get complicated.
@iangow I just noticed
get_filing_docs
had to be changed inscrape_filing_docs.R
. As the functionsget_index_url
,fix_names
, and the updatedget_filing_docs
are inget_filing_doc_functions.R
, I've done this in my latest commit using thesource
function. So if you now pull again and restart,scrape_filing_docs.R
will use the updatedget_filing_docs
.
Now the code doesn't work:
> source('~/git/edgar/filing_docs/scrape_filing_docs.R')
Attaching package: ‘readr’
The following object is masked from ‘package:rvest’:
guess_encoding
Processing batch 1
Error in bind_rows_(x, .id) : Argument 1 must have names
I think this is the error message I addressed by using [1]
to grab the first table.
@iangow
> file_name
[1] "edgar/data/320193/0001193125-09-214859.txt"
> df1 <- filing_docs_df(file_name)
> df1
seq description document type size file_name
1 1 FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009 d10k.htm 10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2 2 SUBSIDIARIES OF THE REGISTRANT dex211.htm EX-21.1 2792 edgar/data/320193/0001193125-09-214859.txt
3 3 CONSENT OF ERNST & YOUNG LLP dex231.htm EX-23.1 1634 edgar/data/320193/0001193125-09-214859.txt
4 4 CONSENT OF KPMG LLP dex232.htm EX-23.2 2390 edgar/data/320193/0001193125-09-214859.txt
5 5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER dex311.htm EX-31.1 9851 edgar/data/320193/0001193125-09-214859.txt
6 6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER dex312.htm EX-31.2 10112 edgar/data/320193/0001193125-09-214859.txt
7 7 SECTION 1350 CERTIFICATIONS OF CEO AND CFO dex321.htm EX-32.1 5354 edgar/data/320193/0001193125-09-214859.txt
8 14 GRAPHIC g91485g21p46.jpg GRAPHIC 53857 edgar/data/320193/0001193125-09-214859.txt
9 NA Complete submission text file 0001193125-09-214859.txt 3638340 edgar/data/320193/0001193125-09-214859.txt
10 8 XBRL INSTANCE DOCUMENT aapl-20090926.xml EX-101.INS 760344 edgar/data/320193/0001193125-09-214859.txt
11 9 XBRL TAXONOMY EXTENSION SCHEMA aapl-20090926.xsd EX-101.SCH 13066 edgar/data/320193/0001193125-09-214859.txt
12 10 XBRL TAXONOMY EXTENSION CALCULATION LINKBASE aapl-20090926_cal.xml EX-101.CAL 30955 edgar/data/320193/0001193125-09-214859.txt
13 11 XBRL TAXONOMY EXTENSION DEFINITION LINKBASE aapl-20090926_def.xml EX-101.DEF 19450 edgar/data/320193/0001193125-09-214859.txt
14 12 XBRL TAXONOMY EXTENSION LABEL LINKBASE aapl-20090926_lab.xml EX-101.LAB 100641 edgar/data/320193/0001193125-09-214859.txt
15 13 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE aapl-20090926_pre.xml EX-101.PRE 80647 edgar/data/320193/0001193125-09-214859.txt
> f2
[1] "edgar/data/1046404/0000871839-18-000061-index.txt"
> df2 <- filing_docs_df(f2)
> df2
seq description document type size file_name
1 1 proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
2 2 GRAPHIC img_3fb1dcc13ad04.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
3 3 GRAPHIC img_7c4a99f133244.jpg GRAPHIC 50688 edgar/data/1046404/0000871839-18-000061-index.txt
4 4 GRAPHIC img_39af8f5852b44.jpg GRAPHIC 41924 edgar/data/1046404/0000871839-18-000061-index.txt
5 5 GRAPHIC img_58e800b8f91b4.jpg GRAPHIC 46154 edgar/data/1046404/0000871839-18-000061-index.txt
6 6 GRAPHIC img_61e4e19e30d84.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
7 7 GRAPHIC img_742ca67764644.jpg GRAPHIC 1952 edgar/data/1046404/0000871839-18-000061-index.txt
8 8 GRAPHIC img_18245b4143de4.jpg GRAPHIC 1957 edgar/data/1046404/0000871839-18-000061-index.txt
9 9 GRAPHIC img_204353f228fd4.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
10 10 GRAPHIC img_286584bbcac34.jpg GRAPHIC 48487 edgar/data/1046404/0000871839-18-000061-index.txt
11 11 GRAPHIC img_a17ec2d0123f4.jpg GRAPHIC 3622 edgar/data/1046404/0000871839-18-000061-index.txt
12 12 GRAPHIC img_bda416a65f094.jpg GRAPHIC 63014 edgar/data/1046404/0000871839-18-000061-index.txt
13 13 GRAPHIC img_c147f393a5fe4.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
14 14 GRAPHIC img_ddb30f382a384.jpg GRAPHIC 4610 edgar/data/1046404/0000871839-18-000061-index.txt
15 15 GRAPHIC img_e82bb2bb0af14.jpg GRAPHIC 1771 edgar/data/1046404/0000871839-18-000061-index.txt
16 16 GRAPHIC img_e89a234d074c4.jpg GRAPHIC 39813 edgar/data/1046404/0000871839-18-000061-index.txt
17 17 GRAPHIC img_ea1cfc908cbf4.jpg GRAPHIC 4606 edgar/data/1046404/0000871839-18-000061-index.txt
18 18 GRAPHIC img_ea5f423baea34.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
19 19 GRAPHIC img_ed7f775d2ba74.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
20 NA Complete submission text file 0000871839-18-000061.txt 985260 edgar/data/1046404/0000871839-18-000061-index.txt
> bind_rows(df1, df2)
seq description document type size file_name
1 1 FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009 d10k.htm 10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2 2 SUBSIDIARIES OF THE REGISTRANT dex211.htm EX-21.1 2792 edgar/data/320193/0001193125-09-214859.txt
3 3 CONSENT OF ERNST & YOUNG LLP dex231.htm EX-23.1 1634 edgar/data/320193/0001193125-09-214859.txt
4 4 CONSENT OF KPMG LLP dex232.htm EX-23.2 2390 edgar/data/320193/0001193125-09-214859.txt
5 5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER dex311.htm EX-31.1 9851 edgar/data/320193/0001193125-09-214859.txt
6 6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER dex312.htm EX-31.2 10112 edgar/data/320193/0001193125-09-214859.txt
7 7 SECTION 1350 CERTIFICATIONS OF CEO AND CFO dex321.htm EX-32.1 5354 edgar/data/320193/0001193125-09-214859.txt
8 14 GRAPHIC g91485g21p46.jpg GRAPHIC 53857 edgar/data/320193/0001193125-09-214859.txt
9 NA Complete submission text file 0001193125-09-214859.txt 3638340 edgar/data/320193/0001193125-09-214859.txt
10 8 XBRL INSTANCE DOCUMENT aapl-20090926.xml EX-101.INS 760344 edgar/data/320193/0001193125-09-214859.txt
11 9 XBRL TAXONOMY EXTENSION SCHEMA aapl-20090926.xsd EX-101.SCH 13066 edgar/data/320193/0001193125-09-214859.txt
12 10 XBRL TAXONOMY EXTENSION CALCULATION LINKBASE aapl-20090926_cal.xml EX-101.CAL 30955 edgar/data/320193/0001193125-09-214859.txt
13 11 XBRL TAXONOMY EXTENSION DEFINITION LINKBASE aapl-20090926_def.xml EX-101.DEF 19450 edgar/data/320193/0001193125-09-214859.txt
14 12 XBRL TAXONOMY EXTENSION LABEL LINKBASE aapl-20090926_lab.xml EX-101.LAB 100641 edgar/data/320193/0001193125-09-214859.txt
15 13 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE aapl-20090926_pre.xml EX-101.PRE 80647 edgar/data/320193/0001193125-09-214859.txt
16 1 proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
17 2 GRAPHIC img_3fb1dcc13ad04.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
18 3 GRAPHIC img_7c4a99f133244.jpg GRAPHIC 50688 edgar/data/1046404/0000871839-18-000061-index.txt
19 4 GRAPHIC img_39af8f5852b44.jpg GRAPHIC 41924 edgar/data/1046404/0000871839-18-000061-index.txt
20 5 GRAPHIC img_58e800b8f91b4.jpg GRAPHIC 46154 edgar/data/1046404/0000871839-18-000061-index.txt
21 6 GRAPHIC img_61e4e19e30d84.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
22 7 GRAPHIC img_742ca67764644.jpg GRAPHIC 1952 edgar/data/1046404/0000871839-18-000061-index.txt
23 8 GRAPHIC img_18245b4143de4.jpg GRAPHIC 1957 edgar/data/1046404/0000871839-18-000061-index.txt
24 9 GRAPHIC img_204353f228fd4.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
25 10 GRAPHIC img_286584bbcac34.jpg GRAPHIC 48487 edgar/data/1046404/0000871839-18-000061-index.txt
26 11 GRAPHIC img_a17ec2d0123f4.jpg GRAPHIC 3622 edgar/data/1046404/0000871839-18-000061-index.txt
27 12 GRAPHIC img_bda416a65f094.jpg GRAPHIC 63014 edgar/data/1046404/0000871839-18-000061-index.txt
28 13 GRAPHIC img_c147f393a5fe4.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
29 14 GRAPHIC img_ddb30f382a384.jpg GRAPHIC 4610 edgar/data/1046404/0000871839-18-000061-index.txt
30 15 GRAPHIC img_e82bb2bb0af14.jpg GRAPHIC 1771 edgar/data/1046404/0000871839-18-000061-index.txt
31 16 GRAPHIC img_e89a234d074c4.jpg GRAPHIC 39813 edgar/data/1046404/0000871839-18-000061-index.txt
32 17 GRAPHIC img_ea1cfc908cbf4.jpg GRAPHIC 4606 edgar/data/1046404/0000871839-18-000061-index.txt
33 18 GRAPHIC img_ea5f423baea34.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
34 19 GRAPHIC img_ed7f775d2ba74.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
35 NA Complete submission text file 0000871839-18-000061.txt 985260 edgar/data/1046404/0000871839-18-000061-index.txt
> df_list = list()
> df_list <- list()
> df_list[[1]] <- df1
> df_list[[2]] <- df2
> df_list
[[1]]
seq description document type size file_name
1 1 FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009 d10k.htm 10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2 2 SUBSIDIARIES OF THE REGISTRANT dex211.htm EX-21.1 2792 edgar/data/320193/0001193125-09-214859.txt
3 3 CONSENT OF ERNST & YOUNG LLP dex231.htm EX-23.1 1634 edgar/data/320193/0001193125-09-214859.txt
4 4 CONSENT OF KPMG LLP dex232.htm EX-23.2 2390 edgar/data/320193/0001193125-09-214859.txt
5 5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER dex311.htm EX-31.1 9851 edgar/data/320193/0001193125-09-214859.txt
6 6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER dex312.htm EX-31.2 10112 edgar/data/320193/0001193125-09-214859.txt
7 7 SECTION 1350 CERTIFICATIONS OF CEO AND CFO dex321.htm EX-32.1 5354 edgar/data/320193/0001193125-09-214859.txt
8 14 GRAPHIC g91485g21p46.jpg GRAPHIC 53857 edgar/data/320193/0001193125-09-214859.txt
9 NA Complete submission text file 0001193125-09-214859.txt 3638340 edgar/data/320193/0001193125-09-214859.txt
10 8 XBRL INSTANCE DOCUMENT aapl-20090926.xml EX-101.INS 760344 edgar/data/320193/0001193125-09-214859.txt
11 9 XBRL TAXONOMY EXTENSION SCHEMA aapl-20090926.xsd EX-101.SCH 13066 edgar/data/320193/0001193125-09-214859.txt
12 10 XBRL TAXONOMY EXTENSION CALCULATION LINKBASE aapl-20090926_cal.xml EX-101.CAL 30955 edgar/data/320193/0001193125-09-214859.txt
13 11 XBRL TAXONOMY EXTENSION DEFINITION LINKBASE aapl-20090926_def.xml EX-101.DEF 19450 edgar/data/320193/0001193125-09-214859.txt
14 12 XBRL TAXONOMY EXTENSION LABEL LINKBASE aapl-20090926_lab.xml EX-101.LAB 100641 edgar/data/320193/0001193125-09-214859.txt
15 13 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE aapl-20090926_pre.xml EX-101.PRE 80647 edgar/data/320193/0001193125-09-214859.txt
[[2]]
seq description document type size file_name
1 1 proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
2 2 GRAPHIC img_3fb1dcc13ad04.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
3 3 GRAPHIC img_7c4a99f133244.jpg GRAPHIC 50688 edgar/data/1046404/0000871839-18-000061-index.txt
4 4 GRAPHIC img_39af8f5852b44.jpg GRAPHIC 41924 edgar/data/1046404/0000871839-18-000061-index.txt
5 5 GRAPHIC img_58e800b8f91b4.jpg GRAPHIC 46154 edgar/data/1046404/0000871839-18-000061-index.txt
6 6 GRAPHIC img_61e4e19e30d84.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
7 7 GRAPHIC img_742ca67764644.jpg GRAPHIC 1952 edgar/data/1046404/0000871839-18-000061-index.txt
8 8 GRAPHIC img_18245b4143de4.jpg GRAPHIC 1957 edgar/data/1046404/0000871839-18-000061-index.txt
9 9 GRAPHIC img_204353f228fd4.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
10 10 GRAPHIC img_286584bbcac34.jpg GRAPHIC 48487 edgar/data/1046404/0000871839-18-000061-index.txt
11 11 GRAPHIC img_a17ec2d0123f4.jpg GRAPHIC 3622 edgar/data/1046404/0000871839-18-000061-index.txt
12 12 GRAPHIC img_bda416a65f094.jpg GRAPHIC 63014 edgar/data/1046404/0000871839-18-000061-index.txt
13 13 GRAPHIC img_c147f393a5fe4.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
14 14 GRAPHIC img_ddb30f382a384.jpg GRAPHIC 4610 edgar/data/1046404/0000871839-18-000061-index.txt
15 15 GRAPHIC img_e82bb2bb0af14.jpg GRAPHIC 1771 edgar/data/1046404/0000871839-18-000061-index.txt
16 16 GRAPHIC img_e89a234d074c4.jpg GRAPHIC 39813 edgar/data/1046404/0000871839-18-000061-index.txt
17 17 GRAPHIC img_ea1cfc908cbf4.jpg GRAPHIC 4606 edgar/data/1046404/0000871839-18-000061-index.txt
18 18 GRAPHIC img_ea5f423baea34.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
19 19 GRAPHIC img_ed7f775d2ba74.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
20 NA Complete submission text file 0000871839-18-000061.txt 985260 edgar/data/1046404/0000871839-18-000061-index.txt
> bind_rows(df_list)
seq description document type size file_name
1 1 FOR THE FISCAL YEAR ENDED SEPTEMBER 26, 2009 d10k.htm 10-K 1231750 edgar/data/320193/0001193125-09-214859.txt
2 2 SUBSIDIARIES OF THE REGISTRANT dex211.htm EX-21.1 2792 edgar/data/320193/0001193125-09-214859.txt
3 3 CONSENT OF ERNST & YOUNG LLP dex231.htm EX-23.1 1634 edgar/data/320193/0001193125-09-214859.txt
4 4 CONSENT OF KPMG LLP dex232.htm EX-23.2 2390 edgar/data/320193/0001193125-09-214859.txt
5 5 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF EXECUTIVE OFFICER dex311.htm EX-31.1 9851 edgar/data/320193/0001193125-09-214859.txt
6 6 RULE 13A-14(A) / 15D-14(A) CERTIFICATION OF CHIEF FINANCIAL OFFICER dex312.htm EX-31.2 10112 edgar/data/320193/0001193125-09-214859.txt
7 7 SECTION 1350 CERTIFICATIONS OF CEO AND CFO dex321.htm EX-32.1 5354 edgar/data/320193/0001193125-09-214859.txt
8 14 GRAPHIC g91485g21p46.jpg GRAPHIC 53857 edgar/data/320193/0001193125-09-214859.txt
9 NA Complete submission text file 0001193125-09-214859.txt 3638340 edgar/data/320193/0001193125-09-214859.txt
10 8 XBRL INSTANCE DOCUMENT aapl-20090926.xml EX-101.INS 760344 edgar/data/320193/0001193125-09-214859.txt
11 9 XBRL TAXONOMY EXTENSION SCHEMA aapl-20090926.xsd EX-101.SCH 13066 edgar/data/320193/0001193125-09-214859.txt
12 10 XBRL TAXONOMY EXTENSION CALCULATION LINKBASE aapl-20090926_cal.xml EX-101.CAL 30955 edgar/data/320193/0001193125-09-214859.txt
13 11 XBRL TAXONOMY EXTENSION DEFINITION LINKBASE aapl-20090926_def.xml EX-101.DEF 19450 edgar/data/320193/0001193125-09-214859.txt
14 12 XBRL TAXONOMY EXTENSION LABEL LINKBASE aapl-20090926_lab.xml EX-101.LAB 100641 edgar/data/320193/0001193125-09-214859.txt
15 13 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE aapl-20090926_pre.xml EX-101.PRE 80647 edgar/data/320193/0001193125-09-214859.txt
16 1 proxyadditionalmateri-201715.htm DEF 14A 399094 edgar/data/1046404/0000871839-18-000061-index.txt
17 2 GRAPHIC img_3fb1dcc13ad04.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
18 3 GRAPHIC img_7c4a99f133244.jpg GRAPHIC 50688 edgar/data/1046404/0000871839-18-000061-index.txt
19 4 GRAPHIC img_39af8f5852b44.jpg GRAPHIC 41924 edgar/data/1046404/0000871839-18-000061-index.txt
20 5 GRAPHIC img_58e800b8f91b4.jpg GRAPHIC 46154 edgar/data/1046404/0000871839-18-000061-index.txt
21 6 GRAPHIC img_61e4e19e30d84.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
22 7 GRAPHIC img_742ca67764644.jpg GRAPHIC 1952 edgar/data/1046404/0000871839-18-000061-index.txt
23 8 GRAPHIC img_18245b4143de4.jpg GRAPHIC 1957 edgar/data/1046404/0000871839-18-000061-index.txt
24 9 GRAPHIC img_204353f228fd4.jpg GRAPHIC 3289 edgar/data/1046404/0000871839-18-000061-index.txt
25 10 GRAPHIC img_286584bbcac34.jpg GRAPHIC 48487 edgar/data/1046404/0000871839-18-000061-index.txt
26 11 GRAPHIC img_a17ec2d0123f4.jpg GRAPHIC 3622 edgar/data/1046404/0000871839-18-000061-index.txt
27 12 GRAPHIC img_bda416a65f094.jpg GRAPHIC 63014 edgar/data/1046404/0000871839-18-000061-index.txt
28 13 GRAPHIC img_c147f393a5fe4.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
29 14 GRAPHIC img_ddb30f382a384.jpg GRAPHIC 4610 edgar/data/1046404/0000871839-18-000061-index.txt
30 15 GRAPHIC img_e82bb2bb0af14.jpg GRAPHIC 1771 edgar/data/1046404/0000871839-18-000061-index.txt
31 16 GRAPHIC img_e89a234d074c4.jpg GRAPHIC 39813 edgar/data/1046404/0000871839-18-000061-index.txt
32 17 GRAPHIC img_ea1cfc908cbf4.jpg GRAPHIC 4606 edgar/data/1046404/0000871839-18-000061-index.txt
33 18 GRAPHIC img_ea5f423baea34.jpg GRAPHIC 2695 edgar/data/1046404/0000871839-18-000061-index.txt
34 19 GRAPHIC img_ed7f775d2ba74.jpg GRAPHIC 1953 edgar/data/1046404/0000871839-18-000061-index.txt
35 NA Complete submission text file 0000871839-18-000061.txt 985260 edgar/data/1046404/0000871839-18-000061-index.txt
Here, I tried binding the rows of the dataframes produced by filing_docs_df
on two different filings, and was successful. So it can't be the .[1] indexing
@iangow did the source('get_filing_doc_functions.R')
line work? The other place where bind_rows
is used is here
while(nrow(file_names <- get_file_names()) > 0) {
batch <- batch + 1
cat("Processing batch", batch, "\n")
temp <- mclapply(file_names$file_name, get_filing_docs, mc.cores = 6)
if (length(temp) > 0) {
df <- bind_rows(temp)
if (nrow(df) > 0) {
cat("Writing data ...\n")
dbWriteTable(pg, "filing_docs",
df, append = TRUE, row.names = FALSE)
} else {
cat("No data ...\n")
}
}
}
If that line didn't work, then get_filing_docs would not be in the namespace, and the bind_rows line will return an error.
The way get_filing_docs
functions is very different in the new version (for example, the new one returns TRUE
or FALSE
, while the old one returned the scraped data). I think the easiest thing to do would be to adapt the "old" function to incorporate the error-handling, etc.
The good news is that I think this is the only code using this function now, so you might be able to find some shortcuts.
@iangow As I just said, essentially filing_docs_df
does what your version of get_filing_docs
did, but with the update of including all tables with filing documents, as it likewise returns a dataframe rather than a Boolean. I just did
> file_names <- get_file_names()
> temp <- mclapply(file_names$file_name, filing_docs_df, mc.cores = 6)
> df <- bind_rows(temp)
> df
seq description document type size file_name
1 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-048085.paper REGDEX/A 295 edgar/data/1132469/9999999997-06-048085.txt
2 NA Scanned paper document scanned.pdf 243893 edgar/data/1132469/9999999997-06-048085.txt
3 NA Complete submission text file 9999999997-06-048085.txt 1459 edgar/data/1132469/9999999997-06-048085.txt
4 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-038939.paper FOCUSN 293 edgar/data/78017/9999999997-06-038939.txt
5 NA Complete submission text file 9999999997-06-038939.txt 1847 edgar/data/78017/9999999997-06-038939.txt
6 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-028915.paper X-17A-5 294 edgar/data/78017/9999999997-06-028915.txt
7 NA Scanned paper document scanned.pdf 393873 edgar/data/78017/9999999997-06-028915.txt
8 NA Complete submission text file 9999999997-06-028915.txt 1850 edgar/data/78017/9999999997-06-028915.txt
9 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-013694.paper TA-1/A 293 edgar/data/849542/9999999997-06-013694.txt
10 NA Complete submission text file 9999999997-06-013694.txt 1870 edgar/data/849542/9999999997-06-013694.txt
11 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-024352.paper X-17A-5/A 296 edgar/data/354497/9999999997-06-024352.txt
12 NA Scanned paper document scanned.pdf 108794 edgar/data/354497/9999999997-06-024352.txt
13 NA Complete submission text file 9999999997-06-024352.txt 1980 edgar/data/354497/9999999997-06-024352.txt
14 1 FORM 10-Q d34692e10vq.htm 10-Q 436342 edgar/data/1095315/0000950134-06-010004.txt
15 2 FOURTH AMENDED AND RESTATED NOTES PAYABLE SUBORDINATION AGREEMENT d34692exv10w1.htm EX-10.1 22673 edgar/data/1095315/0000950134-06-010004.txt
16 3 AMENDMENT 7 TO AGREEMENT FOR INVENTORY FINANCING d34692exv10w2.htm EX-10.2 57296 edgar/data/1095315/0000950134-06-010004.txt
17 4 AMENDMENT 6 TO AMENDED AND RESTATED PLATINUM PLAN AGREEMENT d34692exv10w3.htm EX-10.3 89454 edgar/data/1095315/0000950134-06-010004.txt
18 5 AGREEMENT d34692exv10w4.htm EX-10.4 59473 edgar/data/1095315/0000950134-06-010004.txt
19 6 SECOND AMENDMENT TO LOAN AND SECURITY AGREEMENT d34692exv10w5.htm EX-10.5 30185 edgar/data/1095315/0000950134-06-010004.txt
20 7 AMENDMENT 4 TO LOAN AND SECURITY AGREEMENT d34692exv10w6.htm EX-10.6 40034 edgar/data/1095315/0000950134-06-010004.txt
21 8 GUARANTY d34692exv10w7.htm EX-10.7 38308 edgar/data/1095315/0000950134-06-010004.txt
22 9 SECOND AMENDMENT TO FIRST AMENDED AND RESTATED LOAN AND SECURITY AGREEMENT d34692exv10w8.htm EX-10.8 24409 edgar/data/1095315/0000950134-06-010004.txt
23 10 CERTIFICATION OF CEO PURSUANT TO SECTION 302 d34692exv31w1.htm EX-31.1 6920 edgar/data/1095315/0000950134-06-010004.txt
24 11 CERTIFICATION OF CFO PURSUANT TO SECTION 302 d34692exv31w2.htm EX-31.2 6407 edgar/data/1095315/0000950134-06-010004.txt
25 12 CERTIFICATIONS OF CEO & CFO PURSUANT TO SECTION 906 d34692exv32w1.htm EX-32.1 4014 edgar/data/1095315/0000950134-06-010004.txt
26 NA Complete submission text file 0000950134-06-010004.txt 816901 edgar/data/1095315/0000950134-06-010004.txt
27 1 PRELIMINARY PROXY STATEMENT d35110ppre14a.htm PRE 14A 165604 edgar/data/1095315/0000950134-06-007451.txt
28 2 GRAPHIC d35110pd3511001.gif GRAPHIC 3049 edgar/data/1095315/0000950134-06-007451.txt
29 3 GRAPHIC d35110pd3511002.gif GRAPHIC 2942 edgar/data/1095315/0000950134-06-007451.txt
30 4 GRAPHIC d35110pd3511003.gif GRAPHIC 11006 edgar/data/1095315/0000950134-06-007451.txt
31 NA Complete submission text file 0000950134-06-007451.txt 190874 edgar/data/1095315/0000950134-06-007451.txt
32 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-026184.paper REGDEX 293 edgar/data/1095315/9999999997-06-026184.txt
33 NA Scanned paper document scanned.pdf 431768 edgar/data/1095315/9999999997-06-026184.txt
34 NA Complete submission text file 9999999997-06-026184.txt 1703 edgar/data/1095315/9999999997-06-026184.txt
35 1 CORP Q1 2006 FORM 10-Q pge10q_q1.htm 10-Q 794817 edgar/data/1004980/0001004980-06-000126.txt
36 2 EXHIBIT 10 q106_ex10.htm EX-10 174779 edgar/data/1004980/0001004980-06-000126.txt
37 3 EXHIBIT 11 q106_ex11.htm EX-11 29829 edgar/data/1004980/0001004980-06-000126.txt
38 4 EXHIBIT 12.1 q106_ex12-1.htm EX-12.1 18293 edgar/data/1004980/0001004980-06-000126.txt
39 5 EXHIBIT 12.2 q106_ex12-2.htm EX-12.2 25570 edgar/data/1004980/0001004980-06-000126.txt
40 6 CORP CEO/CFO SECTION 302 CERTIFICATION q106_ex31-1.htm EX-31.1 9904 edgar/data/1004980/0001004980-06-000126.txt
41 7 UTILITY CEO/CFO SECTION 302 CERTIFICATION q106_ex31-2.htm EX-31.2 9869 edgar/data/1004980/0001004980-06-000126.txt
42 8 CORP CEO/CFO SECTION 906 CERTIFICATION q106_ex32-1corp906cert.htm EX-32.1 5037 edgar/data/1004980/0001004980-06-000126.txt
43 9 UTILITY CEO/CFO SECTION 906 CERTIFICATION q106_ex32-2906cert.htm EX-32.2 4774 edgar/data/1004980/0001004980-06-000126.txt
44 NA Complete submission text file 0001004980-06-000126.txt 1074549 edgar/data/1004980/0001004980-06-000126.txt
45 1 PG&E CORPORATION 11-K 2005 corprsp11k_05.htm 11-K 4049 edgar/data/1004980/0001004980-06-000157.txt
46 2 CORP MANAGEMENT & UNION RETIREMENT SAVINGS PLAN 2005 pgecorprspfsmerged_05.htm EX-99.1 158053 edgar/data/1004980/0001004980-06-000157.txt
47 3 INDEPENDENT AUDITOR'S CONSENT LETTER corprspconsent_05.htm EX-99.2 1020 edgar/data/1004980/0001004980-06-000157.txt
48 NA Complete submission text file 0001004980-06-000157.txt 164664 edgar/data/1004980/0001004980-06-000157.txt
49 1 filename1.htm CORRESP 26848 edgar/data/1004980/0001004980-06-000139.txt
50 NA Complete submission text file 0001004980-06-000139.txt 28271 edgar/data/1004980/0001004980-06-000139.txt
51 1 filename1.htm CORRESP 46822 edgar/data/1004980/0001004980-06-000159.txt
52 NA Complete submission text file 0001004980-06-000159.txt 48241 edgar/data/1004980/0001004980-06-000159.txt
53 1 filename1.txt LETTER 6720 edgar/data/1004980/0000000000-06-018834.txt
54 NA Complete submission text file 0000000000-06-018834.txt 8200 edgar/data/1004980/0000000000-06-018834.txt
55 1 filename1.pdf LETTER 22516 edgar/data/1004980/0000000000-06-026668.txt
56 NA Complete submission text file 0000000000-06-026668.txt 32636 edgar/data/1004980/0000000000-06-026668.txt
57 1 PG&E ENERGY RECOVERY FUNDING LLC 10D perf10d.htm 10-D 16759 edgar/data/1305629/0001305629-06-000014.txt
58 2 PG&E ENERGY RECOVERY FUNDING LLC EXHIBIT 99.1 q2erb1cer.htm EX-99 151195 edgar/data/1305629/0001305629-06-000014.txt
59 3 PG&E ENERGY RECOVERY FUNDING LLC EXHIBIT 99.2 q2erb2cer.htm EX-99 149464 edgar/data/1305629/0001305629-06-000014.txt
60 NA Complete submission text file 0001305629-06-000014.txt 318834 edgar/data/1305629/0001305629-06-000014.txt
61 1 PG&E FUNDING LLC FORM 10Q funding_form10qv2.htm 10-Q 96055 edgar/data/1041637/0001041637-06-000008.txt
62 2 PG&E FUNDING LLC EXHIBIT 31 exhibit_31v2.htm EX-31 17886 edgar/data/1041637/0001041637-06-000008.txt
63 3 PG&E FUNDING LLC EXHIBIT 32 exhibit_32v2.htm EX-32 3340 edgar/data/1041637/0001041637-06-000008.txt
64 4 PG&E FUNDING LLC EXHIBIT 99 exhibit_99v2.htm EX-99 104359 edgar/data/1041637/0001041637-06-000008.txt
65 NA Complete submission text file 0001041637-06-000008.txt 223062 edgar/data/1041637/0001041637-06-000008.txt
66 1 pgi10q.txt 10QSB 46105 edgar/data/81157/0001068800-06-000409.txt
67 2 ex31p1.txt EX-31.1 3598 edgar/data/81157/0001068800-06-000409.txt
68 3 ex31p2.txt EX-31.2 3598 edgar/data/81157/0001068800-06-000409.txt
69 4 ex32p1.txt EX-32.1 1370 edgar/data/81157/0001068800-06-000409.txt
70 5 ex32p2.txt EX-32.2 1370 edgar/data/81157/0001068800-06-000409.txt
71 NA Complete submission text file 0001068800-06-000409.txt 57552 edgar/data/81157/0001068800-06-000409.txt
72 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-024770.paper REGDEX 293 edgar/data/1283956/9999999997-06-024770.txt
73 NA Complete submission text file 9999999997-06-024770.txt 1426 edgar/data/1283956/9999999997-06-024770.txt
74 1 QUARTERLY REPORT PURSUANT TO SECTIONS 13 OR 15(D) a06-12022_210qsb.htm 10QSB 635374 edgar/data/1127005/0001104659-06-036613.txt
75 2 EX-31 a06-12022_2ex31d1.htm EX-31.1 13644 edgar/data/1127005/0001104659-06-036613.txt
76 3 EX-31 a06-12022_2ex31d2.htm EX-31.2 13938 edgar/data/1127005/0001104659-06-036613.txt
77 4 EX-32 a06-12022_2ex32.htm EX-32 8647 edgar/data/1127005/0001104659-06-036613.txt
78 NA Complete submission text file 0001104659-06-036613.txt 673324 edgar/data/1127005/0001104659-06-036613.txt
79 1 NOTICE OF INABILITY TO TIMELY FILE A FORM 10-Q a06-12022_1nt10q.htm NT 10-Q 61807 edgar/data/1127005/0001104659-06-035407.txt
80 NA Complete submission text file 0001104659-06-035407.txt 63562 edgar/data/1127005/0001104659-06-035407.txt
81 1 FILED PURSUANT TO RULE 424(B)(4) y18025b4e424b4.htm 424B4 1412125 edgar/data/1354327/0000950123-06-008297.txt
82 2 GRAPHIC y18025b4y1802503.gif GRAPHIC 1806 edgar/data/1354327/0000950123-06-008297.txt
83 3 GRAPHIC y18025b4y1802504.gif GRAPHIC 232981 edgar/data/1354327/0000950123-06-008297.txt
84 4 GRAPHIC y18025b4y1802501.gif GRAPHIC 5984 edgar/data/1354327/0000950123-06-008297.txt
85 5 GRAPHIC y18025b4y1802502.gif GRAPHIC 5768 edgar/data/1354327/0000950123-06-008297.txt
86 6 GRAPHIC y18025b4y1802506.gif GRAPHIC 255413 edgar/data/1354327/0000950123-06-008297.txt
87 NA Complete submission text file 0000950123-06-008297.txt 2105806 edgar/data/1354327/0000950123-06-008297.txt
88 1 FORM 8-A12G y22393e8va12g.txt 8-A12G 5695 edgar/data/1354327/0000950123-06-007803.txt
89 NA Complete submission text file 0000950123-06-007803.txt 7040 edgar/data/1354327/0000950123-06-007803.txt
90 1 filename1.htm CORRESP 4051 edgar/data/1354327/0000950123-06-008050.txt
91 NA Complete submission text file 0000950123-06-008050.txt 5296 edgar/data/1354327/0000950123-06-008050.txt
92 1 filename1.txt CORRESP 2399 edgar/data/1354327/0000950123-06-008055.txt
93 NA Complete submission text file 0000950123-06-008055.txt 3646 edgar/data/1354327/0000950123-06-008055.txt
94 1 filename1.txt CORRESP 2108 edgar/data/1354327/0000950123-06-008104.txt
95 NA Complete submission text file 0000950123-06-008104.txt 3355 edgar/data/1354327/0000950123-06-008104.txt
96 1 primary_doc.html EFFECT NA edgar/data/1354327/9999999995-06-000500.txt
97 1 primary_doc.xml EFFECT 505 edgar/data/1354327/9999999995-06-000500.txt
98 NA Complete submission text file 9999999995-06-000500.txt 1965 edgar/data/1354327/9999999995-06-000500.txt
99 1 FORM FWP y18025fwfwp.htm FWP 9011 edgar/data/1354327/0000950123-06-008214.txt
100 NA Complete submission text file 0000950123-06-008214.txt 10886 edgar/data/1354327/0000950123-06-008214.txt
101 1 AMENDMENT NO. 1 TO FORM S-1 y18025a1sv1za.htm S-1/A 1285240 edgar/data/1354327/0000950123-06-004939.txt
102 2 EX-10.1: SECOND AMENDED AND RESTATED CREDIT AGREEMENT y18025a1exv10w1.txt EX-10.1 493595 edgar/data/1354327/0000950123-06-004939.txt
103 3 EX-10.2: SECOND LIEN CREDIT AGREEMENT y18025a1exv10w2.txt EX-10.2 412122 edgar/data/1354327/0000950123-06-004939.txt
104 4 EX-10.3: AMENDED AND RESTATED PLEDGE AND SECURITY AGREEMENT y18025a1exv10w3.txt EX-10.3 155317 edgar/data/1354327/0000950123-06-004939.txt
105 5 EX-10.4: SECOND LIEN PLEDGE AND SECURITY AGREEMENT y18025a1exv10w4.txt EX-10.4 152423 edgar/data/1354327/0000950123-06-004939.txt
106 6 EX-10.5: 2004 STOCK INCENTIVE PLAN y18025a1exv10w5.txt EX-10.5 14245 edgar/data/1354327/0000950123-06-004939.txt
107 7 EX-10.6: FORM OF 2004 STOCK INCENTIVE PLAN STOCK OPTION AGREEMENT y18025a1exv10w6.txt EX-10.6 22438 edgar/data/1354327/0000950123-06-004939.txt
108 8 EX-10.10: EMPLOYMENT AGREEMENT y18025a1exv10w10.txt EX-10.10 40283 edgar/data/1354327/0000950123-06-004939.txt
109 9 EX-10.11: EMPLOYMENT AGREEMENT y18025a1exv10w11.txt EX-10.11 40224 edgar/data/1354327/0000950123-06-004939.txt
110 10 EX-10.12: EMPLOYMENT AGREEMENT y18025a1exv10w12.txt EX-10.12 41666 edgar/data/1354327/0000950123-06-004939.txt
111 11 EX-10.13: EMPLOYMENT AGREEMENT y18025a1exv10w13.txt EX-10.13 41341 edgar/data/1354327/0000950123-06-004939.txt
112 12 EX-10.14: EMPLOYMENT AGREEMENT y18025a1exv10w14.txt EX-10.14 41353 edgar/data/1354327/0000950123-06-004939.txt
113 13 EX-10.15: EMPLOYMENT AGREEMENT y18025a1exv10w15.txt EX-10.15 40671 edgar/data/1354327/0000950123-06-004939.txt
114 14 EX-10.16: EMPLOYMENT AGREEMENT y18025a1exv10w16.txt EX-10.16 41307 edgar/data/1354327/0000950123-06-004939.txt
115 15 EX-10.18: FORM OF ROLLOVER STOCK OPTION AGREEMENT y18025a1exv10w18.txt EX-10.18 17263 edgar/data/1354327/0000950123-06-004939.txt
116 16 EX-23.1: CONSENT OF ERNST & YOUNG LLP y18025a1exv23w1.txt EX-23.1 768 edgar/data/1354327/0000950123-06-004939.txt
117 19 GRAPHIC y18025a1y1802503.gif GRAPHIC 3527 edgar/data/1354327/0000950123-06-004939.txt
118 20 GRAPHIC y18025a1y1802501.gif GRAPHIC 5979 edgar/data/1354327/0000950123-06-004939.txt
119 21 GRAPHIC y18025a1y1802502.gif GRAPHIC 6237 edgar/data/1354327/0000950123-06-004939.txt
120 22 filename22.txt CORRESP 96895 edgar/data/1354327/0000950123-06-004939.txt
121 23 filename23.htm CORRESP 3513 edgar/data/1354327/0000950123-06-004939.txt
122 NA Complete submission text file 0000950123-06-004939.txt 2964279 edgar/data/1354327/0000950123-06-004939.txt
123 1 AMENDMENT NO. 2 TO FORM S-1 y18025a2sv1za.htm S-1/A 1491491 edgar/data/1354327/0000950123-06-006981.txt
124 2 EX-4.1: FORM OF SPECIMEN CERTIFICATE y18025a2exv4w1.htm EX-4.1 13785 edgar/data/1354327/0000950123-06-006981.txt
125 3 EX-10.19: EMPLOYMENT AGREEMENT y18025a2exv10w19.txt EX-10.19 39617 edgar/data/1354327/0000950123-06-006981.txt
126 4 EX-23.1: CONSENT OF ERNST & YOUNG LLP y18025a2exv23w1.htm EX-23.1 1311 edgar/data/1354327/0000950123-06-006981.txt
127 7 GRAPHIC y18025a2y1802503.gif GRAPHIC 597 edgar/data/1354327/0000950123-06-006981.txt
128 8 GRAPHIC y18025a2y1802501.gif GRAPHIC 5979 edgar/data/1354327/0000950123-06-006981.txt
129 9 GRAPHIC y18025a2y1802502.gif GRAPHIC 6237 edgar/data/1354327/0000950123-06-006981.txt
130 10 GRAPHIC y18025a2e1802501.gif GRAPHIC 86540 edgar/data/1354327/0000950123-06-006981.txt
131 11 filename11.htm COVER 4125 edgar/data/1354327/0000950123-06-006981.txt
132 12 filename12.htm CORRESP 86863 edgar/data/1354327/0000950123-06-006981.txt
133 NA Complete submission text file 0000950123-06-006981.txt 1776081 edgar/data/1354327/0000950123-06-006981.txt
134 1 AMENDMENT NO. 3 TO FORM S-1 y18025a3sv1za.htm S-1/A 1500984 edgar/data/1354327/0000950123-06-007472.txt
135 2 EX-1.1: FORM OF UNDERWRITING AGREEMENT y18025a3exv1w1.txt EX-1.1 114830 edgar/data/1354327/0000950123-06-007472.txt
136 3 EX-3.1: FORM OF AMENDED AND RESTATED CERTIFICATE OF INCORPORATION y18025a3exv3w1.txt EX-3.1 26315 edgar/data/1354327/0000950123-06-007472.txt
137 4 EX-3.2: FORM OF AMENDED AND RESTATED BY-LAWS y18025a3exv3w2.txt EX-3.2 85311 edgar/data/1354327/0000950123-06-007472.txt
138 5 EX-4.2: FORM OF AMENDED AND RESTATED SECURITY HOLDERS' AGREEMENT y18025a3exv4w2.txt EX-4.2 72916 edgar/data/1354327/0000950123-06-007472.txt
139 6 EX-10.7: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN y18025a3exv10w7.txt EX-10.7 46191 edgar/data/1354327/0000950123-06-007472.txt
140 7 EX-10.8: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN NON-QUALIFIED STOCK OPTION y18025a3exv10w8.txt EX-10.8 11945 edgar/data/1354327/0000950123-06-007472.txt
141 8 EX-10.9: EMPLOYMENT AGREEMENT: HERSHBERGER y18025a3exv10w9.txt EX-10.9 41163 edgar/data/1354327/0000950123-06-007472.txt
142 9 EX-10.17: FORM OF DIRECTOR INDEMNIFICATION AGREEMENT y18025a3exv10w17.txt EX-10.17 31723 edgar/data/1354327/0000950123-06-007472.txt
143 10 EX-10.20: SUPPLY AGREEMENT y18025a3exv10w20.txt EX-10.20 13830 edgar/data/1354327/0000950123-06-007472.txt
144 11 EX-10.21: SUPPLIER AGREEMENT y18025a3exv10w21.txt EX-10.21 67130 edgar/data/1354327/0000950123-06-007472.txt
145 12 EX-10.22: SUPPLIER AGREEMENT y18025a3exv10w22.txt EX-10.22 69431 edgar/data/1354327/0000950123-06-007472.txt
146 13 EX-10.23: FORM OF PGT, INC. 2006 MANAGEMENT INCENTIVE PLAN y18025a3exv10w23.txt EX-10.23 13870 edgar/data/1354327/0000950123-06-007472.txt
147 14 EX-10.24: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN RESTRICTED STOCK AWARD AG y18025a3exv10w24.txt EX-10.24 10950 edgar/data/1354327/0000950123-06-007472.txt
148 15 EX-10.25: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN RESTIRCTED STOCK UNIT AWA y18025a3exv10w25.txt EX-10.25 10593 edgar/data/1354327/0000950123-06-007472.txt
149 16 EX-10.26: FORM OF PGT, INC. 2006 EQUITY INCENTIVE PLAN INCENTIVE STOCK OPTION AG y18025a3exv10w26.txt EX-10.26 12386 edgar/data/1354327/0000950123-06-007472.txt
150 17 EX-23.1: CONSENT OF ERNST & YOUNG LLP y18025a3exv23w1.txt EX-23.1 817 edgar/data/1354327/0000950123-06-007472.txt
151 18 EX-99.1: CONSENT OF DIRECTOR NOMINEE y18025a3exv99w1.txt EX-99.1 1068 edgar/data/1354327/0000950123-06-007472.txt
152 27 GRAPHIC y18025a3y1802503.gif GRAPHIC 1806 edgar/data/1354327/0000950123-06-007472.txt
153 28 GRAPHIC y18025a3y1802504.gif GRAPHIC 232981 edgar/data/1354327/0000950123-06-007472.txt
154 29 GRAPHIC y18025a3y1802501.gif GRAPHIC 5979 edgar/data/1354327/0000950123-06-007472.txt
155 30 GRAPHIC y18025a3y1802502.gif GRAPHIC 6237 edgar/data/1354327/0000950123-06-007472.txt
156 31 GRAPHIC y18025a3y1802506.gif GRAPHIC 255413 edgar/data/1354327/0000950123-06-007472.txt
157 32 filename32.htm CORRESP 4679 edgar/data/1354327/0000950123-06-007472.txt
158 33 filename33.htm CORRESP 6853 edgar/data/1354327/0000950123-06-007472.txt
159 34 filename34.htm CORRESP 10944 edgar/data/1354327/0000950123-06-007472.txt
160 35 filename35.txt CORRESP 1480 edgar/data/1354327/0000950123-06-007472.txt
161 36 filename36.htm CORRESP 4113 edgar/data/1354327/0000950123-06-007472.txt
162 37 filename37.htm CORRESP 13254 edgar/data/1354327/0000950123-06-007472.txt
163 38 filename38.htm CORRESP 31570 edgar/data/1354327/0000950123-06-007472.txt
164 39 filename39.htm CORRESP 17401 edgar/data/1354327/0000950123-06-007472.txt
165 NA Complete submission text file 0000950123-06-007472.txt 2916401 edgar/data/1354327/0000950123-06-007472.txt
166 1 AMENDMENT NO. 4 TO FORM S-1 y18025a4sv1za.htm S-1/A 1478836 edgar/data/1354327/0000950123-06-007562.txt
[ reached 'max' / getOption("max.print") -- omitted 4623 rows ]
> temp[[1, 2, 3, 4, 5]]
Error in temp[[1, 2, 3, 4, 5]] : incorrect number of subscripts
> temp[c(1, 2, 3, 4, 5)]
[[1]]
seq description document type size file_name
1 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-048085.paper REGDEX/A 295 edgar/data/1132469/9999999997-06-048085.txt
2 NA Scanned paper document scanned.pdf 243893 edgar/data/1132469/9999999997-06-048085.txt
3 NA Complete submission text file 9999999997-06-048085.txt 1459 edgar/data/1132469/9999999997-06-048085.txt
[[2]]
seq description document type size file_name
1 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-038939.paper FOCUSN 293 edgar/data/78017/9999999997-06-038939.txt
2 NA Complete submission text file 9999999997-06-038939.txt 1847 edgar/data/78017/9999999997-06-038939.txt
[[3]]
seq description document type size file_name
1 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-028915.paper X-17A-5 294 edgar/data/78017/9999999997-06-028915.txt
2 NA Scanned paper document scanned.pdf 393873 edgar/data/78017/9999999997-06-028915.txt
3 NA Complete submission text file 9999999997-06-028915.txt 1850 edgar/data/78017/9999999997-06-028915.txt
[[4]]
seq description document type size file_name
1 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-013694.paper TA-1/A 293 edgar/data/849542/9999999997-06-013694.txt
2 NA Complete submission text file 9999999997-06-013694.txt 1870 edgar/data/849542/9999999997-06-013694.txt
[[5]]
seq description document type size file_name
1 1 AUTO-GENERATED PAPER DOCUMENT 9999999997-06-024352.paper X-17A-5/A 296 edgar/data/354497/9999999997-06-024352.txt
2 NA Scanned paper document scanned.pdf 108794 edgar/data/354497/9999999997-06-024352.txt
3 NA Complete submission text file 9999999997-06-024352.txt 1980 edgar/data/354497/9999999997-06-024352.txt
showing that the code works again if you replace get_filing_docs
to filing_docs_df
in line with the mclapply
(or alternatively you could rename my function get_filing_docs
, if that is preferable).
@bdcallen it doesn't work for me. Would you mind checking that you can run source('~/git/edgar/filing_docs/scrape_filing_docs.R')
after modifying the code?
@bdcallen Never mind. Typo. I've fixed it now.
Code seems to be running OK now (I incorporate error-handling from my version).
I made a tweak here to address an issue in scraping filings like this. Basically the code was trying to scrape two tables that could not be combined. The simple "fix" I made was to only scrape the first table.
But this creates an issue with filings like this.
We need to somehow fix the code so that it doesn't choke on the first filing, but does handle multiple tables (as in the second filing). For now, let's limit this issue to constructing code that works; we can worry about actually running it later.