Closed anjackson closed 1 year ago
The array of old files on the Nominet server seems bit random, but it the recent files appear to be daily. Downloader logic will need to be changed accordingly.
n.b. seed list creation looks simple enough, if we just try every domain:
$ unzip -p ukdata-20210927.zip db-dump-20210927.csv | wc
10999822 10999822 218277702
All good, seed list updated.
Nominet (monthly) Implemented as part of the store command. Run as store nominet on a server that can SFTP and see hdfs.api.wa.bl.uk (e.g. a crawler.)
Now via
webarchivistbl.uk@data.nominet.uk
via SFTP with names like/ukdata/ukdata-YYYYMMDD.zip