Closed bdcallen closed 4 years ago
@iangow Just changed the title of this, as I think making a cronjob is a natural part of this issue.
@iangow I have just made an initial bash script to use as part of a cronjob. So, at least for this, we need a $CODE_DIR
, and we need to choose a time for the cronjob.
@iangow Looking at page 7 of the ABN Bulk Extract readme, it says that the bulk extract is updated weekly. The bulk extract was last updated on 02/10/2019, which was Wednesday. So perhaps a weekly cronjob on the day after, on Thursday, would be best.
@iangow I have just been successful in getting the cronjob to run a bash script including a command to use get_abn_lookup_data.py
. I modified my local abn_lookup bash script so that it included my variables (using export
) for PGHOST
, PGDATABASE
, PGUSER
, and PGPASSWORD
, along with the lines
export ABN_LOOKUP_DIR=/home/bdcallen/abn_lookup
python3 $ABN_LOOKUP_DIR/get_abn_lookup_data.py
I ran into two main issues in getting cron to run the program:
(1) - My program was initially written to be used in the abn_lookup
directory. So when the program called other programs in the folder, cron didn't actually know the full path (because I had not written the full path the programs). This was an easy fix, addressed in the above commit.
(2) - It turned out that when I tried to get cron to run after fixing (1), that I was getting an error that xsltproc
couldn't be found. It turned out this was because xsltproc wasn't in the cron's path, so I set a line to set the PATH
variable in the crontab below, which is the one I eventually ran successfully today
# m h dom mon dow command
PATH=/home/bdcallen/anaconda3/bin:/home/bdcallen/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
20 16 * * * /home/bdcallen/abn_lookup/./abn_lookup_cronjob.sh
After including the PATH
, cron was able to run xsltproc and thus the whole program. Everything worked as expected, and before the cronjob, the count from the abns
table was
crsp=> SELECT COUNT(*) FROM abn_lookup.abns_old
;
count
----------
14498162
(1 row)
and after it, it is now
crsp=> SELECT COUNT(*) FROM abn_lookup.abns;
count
----------
14608789
(1 row)
I didn't realise how fussy cron is with respect to knowing all the paths for all programs used. As an aside, I suspect (1) or (2) or could be issues with the asxlisting cronjob.
@iangow I have left the cronjob as is for now, but with the timing amended to
00 6 * * 5 /home/bdcallen/abn_lookup/./abn_lookup_cronjob.sh
as the bulk extract is updated weekly on the website. So this program will run at 6am every Friday.
@iangow I've changed my crontab to this, and amended the bash script so that the PG variables aren't in it (I've also done the same for asic), and added a line setting ABN_LOOKUP_DIR
in the crontab. Note that the bash script for the abn_lookup part uses ABN_LOOKUP_DIR
instead of a variable CODE_DIR
# m h dom mon dow command
PATH=/home/bdcallen/anaconda3/bin:/home/bdcallen/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
PGHOST=*************
PGDATABASE=***************
PGUSER=************
PGPASSWORD=*************
ABN_LOOKUP_DIR=***********
ASIC_DIR=**************
00 6 * * 5 /home/bdcallen/abn_lookup/./abn_lookup_cronjob.sh
00 3 * * 4 $ASIC_DIR/./asic_bulk_extract_cronjob.sh
Given I know this will work, as the asic part of the cronjob used the environmental variables correctly and executed successfully, I will close this (as well as the analogous issue for abn_lookup) for now. If I see an issue in the output of the cronjob on Friday (I've been getting its output to a file called dead.letter
), perhaps we can reopen then.
@iangow Most of what's needed to do this is in place already, we mostly need to add lines in the appropriate files for deleting the old tables and writing new ones for slotting in the updated data.