nestauk / industrial_taxonomy

Refactor of nestauk/industrial-taxonomy which upon completion will replace it.
MIT License
3 stars 0 forks source link

Fetch and pre-process NSPL #6

Open bishax opened 2 years ago

bishax commented 2 years ago

NSPL stands for National Statistics Postcode Lookup. It is a table mapping postcodes to other official geographies such as countries, local authorities etc. It is available from the Office for National Statistics (ONS) Open Geography Portal.

In this project, we will use the NSPL to assign each company in our dataset to a local authority through the postcode in its registered address.

This requires:

  1. Fetching and extracting the NSPL file (3.3GB uncompressed) from the Open Geography Portal
  2. Reading the relevant csv file from the Data folder. We only need to focus on those columns that we need for the job i.e. pcds (postcode) and laua (local authority code)
  3. Reading the LA_UA names and codes UK as at 04_20.xlsx file from the Documents folder. This is a lookup between local authority codes and names
  4. Creating and saving adict with a lookup between pcds, local authority codes and names. This could be in the form {'pcds':{'lad_name':'foo','lad_code':'bar'}} or something else.
imy99 commented 2 years ago

@bishax Where can I find the LA_UA names and codes UK as at 04_20.xlsx file? I can't seem to find it on the s3 bucket.

bishax commented 2 years ago

@bishax Where can I find the LA_UA names and codes UK as at 04_20.xlsx file? I can't seem to find it on the s3 bucket.

After uncompressing the NSPL file from part 1. there will be a Documents/ folder with that file in.