ukwa / ukwa-services

Deployment configuration for all UKWA services stacks.
Apache License 2.0
4 stars 5 forks source link

Get latest data from Nominet #102

Closed anjackson closed 1 year ago

anjackson commented 1 year ago

Nominet (monthly) Implemented as part of the store command. Run as store nominet on a server that can SFTP and see hdfs.api.wa.bl.uk (e.g. a crawler.)

Now via webarchivistbl.uk@data.nominet.uk via SFTP with names like /ukdata/ukdata-YYYYMMDD.zip

anjackson commented 1 year ago

The array of old files on the Nominet server seems bit random, but it the recent files appear to be daily. Downloader logic will need to be changed accordingly.

anjackson commented 1 year ago

n.b. seed list creation looks simple enough, if we just try every domain:

$ unzip -p ukdata-20210927.zip db-dump-20210927.csv | wc
10999822 10999822 218277702
anjackson commented 1 year ago

All good, seed list updated.