mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Make cronjob for the fundamental edgar tables #66

Closed bdcallen closed 4 years ago

bdcallen commented 4 years ago

@iangow This issue is for dealing with the issue of making a cronjob using the script that updates the fundamental tables (filings, accession_nos, filings_docs and so on), update_edgar.sh, as we discussed previously.

bdcallen commented 4 years ago

@iangow Here is my new crontab, with the new lines as indicated

CODE_DIR=/home/bdcallen/asxlisting
EDGAR_CODE_DIR=/home/bdcallen/edgar    # new line in crontab
ASXLISTING_DIR=/media/igow/6TB/data
ABN_LOOKUP_DIR=/home/bdcallen/abn_lookup
ASIC_DIR=/home/bdcallen/asic

26 17 * * * $CODE_DIR/./asx_prev_day_cronjob.sh
00 19 * * * $EDGAR_CODE_DIR/update_edgar.sh   # new line in crontab
00 6 * * 5 $ABN_LOOKUP_DIR/./abn_lookup_cronjob.sh
00 3 * * 4 $ASIC_DIR/./asic_bulk_extract_cronjob.sh

I have also changed update_edgar.sh in my directory to

#!/usr/bin/env bash
echo "Running get_filings.R ..."
./$EDGAR_CODE_DIR/get_filings.R
echo "Running get_accession_nos.R ..."
./$EDGAR_CODE_DIR/get_accession_nos.R
echo "Running get_filer_ciks.R ..."
./$EDGAR_CODE_DIR/get_filer_ciks.R
echo "Running get_item_nos.R ..."
./$EDGAR_CODE_DIR/item_nos/get_item_nos.R
echo "Running get_item_no_desc.R ..."
./$EDGAR_CODE_DIR/item_nos/get_item_no_desc.R
# ./get_server_logs.R
echo "Running scrape_filing_docs.R ..."
./$EDGAR_CODE_DIR/filing_docs/scrape_filing_docs.R

Note I've introduced a new environmental variable, EDGAR_CODE_DIR, to not clash with a similar variable for asxlisting.

I've scheduled this cronjob for 7pm each night. I think this is a rather good time, as it corresponds to around 2am over on the east coast of the US. I will close this if the cronjob works well after it outputs to dead.letter.

bdcallen commented 4 years ago

@iangow After fixing up some errors in my bash script above

#!/usr/bin/env bash
echo "Running get_filings.R ..."
$EDGAR_CODE_DIR/./get_filings.R
echo "Running get_accession_nos.R ..."
$EDGAR_CODE_DIR/./get_accession_nos.R
echo "Running get_filer_ciks.R ..."
$EDGAR_CODE_DIR/./get_filer_ciks.R
echo "Running get_item_nos.R ..."
$EDGAR_CODE_DIR/./item_nos/get_item_nos.R
echo "Running get_item_no_desc.R ..."
$EDGAR_CODE_DIR/./item_nos/get_item_no_desc.R
# $EDGAR_CODE_DIR/./get_server_logs.R
echo "Running scrape_filing_docs.R ..."
$EDGAR_CODE_DIR/./filing_docs/scrape_filing_docs.R

I managed to test the running of the script successfully through cron this afternoon, with this being the cron's output to dead.letter

Running get_filings.R ...
Updating data for 2019Q4...
Running get_accession_nos.R ...
Running get_filer_ciks.R ...
Running get_item_nos.R ...
Processing batch 1 of 3 ... 41.876 seconds
Processing batch 2 of 3 ... 84.587 seconds
Processing batch 3 of 3 ... 38.914 seconds
Running get_item_no_desc.R ...
Running scrape_filing_docs.R ...
Processing batch 1 
Writing data ...
458.4041 seconds
Processing batch 2 
Writing data ...
91.52212 seconds
Processing batch 3 
Writing data ...
84.79389 seconds
Processing batch 4 
Writing data ...
93.54987 seconds
Processing batch 5 
Writing data ...
95.79431 seconds
Processing batch 6 
Writing data ...
404.7744 seconds

I have set the main edgar cronjob to run at 9pm every night

26 17 * * * $CODE_DIR/./asx_prev_day_cronjob.sh
00 21 * * * $EDGAR_CODE_DIR/./update_edgar.sh    # main edgar cronjob
00 0 * * * $EDGAR_CODE_DIR/./update_forms_345_tables.sh
00 6 * * 5 $ABN_LOOKUP_DIR/./abn_lookup_cronjob.sh
00 3 * * 4 $ASIC_DIR/./asic_bulk_extract_cronjob.sh

I also had to make a few minor tweaks to some of the programs and files used by update_edgar.sh, I'll commit these shortly.

bdcallen commented 4 years ago

@iangow I am going to close this for now, as the cronjob has been working well. Perhaps we could make a follow on issue to detail anything we should add to the cronjob bash script.