mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Dump filing_docs from iangow.me #13

Closed iangow closed 6 years ago

iangow commented 6 years ago

Code is currently updating the table edgar.filing_docs on iangow.me.

You can see that it is still going from the code at the bottom here because the number of rows in the table increases from 12,009,628 to 12,010,274. Once that code yields the same number twice, the code is done and we are ready to pull the data. This is most easily accomplished using pg_dump (available on 10.101.13.99):

pg_dump -h iangow.me -d crsp --format custom --table edgar.filing_docs | \
    pg_restore -d crsp -h 10.101.13.99 --clean
Sys.setenv(PGHOST="iangow.me", PGDATABASE="crsp")
library(dplyr, warn.conflicts = FALSE)
library(RPostgreSQL)
#> Loading required package: DBI

pg <- dbConnect(PostgreSQL())

rs <- dbExecute(pg, "SET search_path TO edgar")

filing_docs <- tbl(pg, "filing_docs")

filing_docs %>% count()
#> # Source:   lazy query [?? x 1]
#> # Database: postgres 9.6.7 [igow@iangow.me:5432/crsp]
#>           n
#>       <dbl>
#> 1 12009628.

Sys.sleep(10)

filing_docs %>% count()
#> # Source:   lazy query [?? x 1]
#> # Database: postgres 9.6.7 [igow@iangow.me:5432/crsp]
#>           n
#>       <dbl>
#> 1 12010274.

rs <- dbDisconnect(pg)

Created on 2018-04-30 by the reprex package (v0.2.0).

iangow commented 6 years ago

@bdcallen

It looks like the table is up to date on iangow.me. So you can do the dump (pg_dump) now.

Sys.setenv(PGHOST="iangow.me", PGDATABASE="crsp")
library(dplyr, warn.conflicts = FALSE)
library(RPostgreSQL)
#> Loading required package: DBI

pg <- dbConnect(PostgreSQL())

rs <- dbExecute(pg, "SET search_path TO edgar")

filing_docs <- tbl(pg, "filing_docs")

filing_docs %>% count()
#> # Source:   lazy query [?? x 1]
#> # Database: postgres 9.6.7 [igow@iangow.me:5432/crsp]
#>           n
#>       <dbl>
#> 1 13660967.

Sys.sleep(10)

filing_docs %>% count()
#> # Source:   lazy query [?? x 1]
#> # Database: postgres 9.6.7 [igow@iangow.me:5432/crsp]
#>           n
#>       <dbl>
#> 1 13660967.

rs <- dbDisconnect(pg)

Created on 2018-05-01 by the reprex package (v0.2.0).

iangow commented 6 years ago

OK. The specifics of the error message helped here:

igow@igow-z640:~$ export PGUSER=bdcallen
igow@igow-z640:~$ export PGPASSWORD=Xxxxxxx
igow@igow-z640:~$ pg_dump -h iangow.me -d crsp --format custom --table edgar.filing_docs | \
>     pg_restore -d crsp -h 10.101.13.99 --clean
pg_dump: [archiver (db)] connection to database "crsp" failed: FATAL:  role "bdcallen" is not permitted to log in
pg_restore: [archiver] input file is too short (read 0, expected 5)

The message role "bdcallen" is not permitted to log in tells me I needed to add LOGIN to your role:

crsp=# ALTER ROLE bdcallen LOGIN;
ALTER ROLE

After that I could run the code:

igow@igow-z640:~$ pg_dump -h iangow.me -d crsp --format custom --table edgar.filing_docs |  \
    pg_restore -d crsp -h 10.101.13.99 --clean

So all done.