niwa / geoapis

A Python package for simply downloading publicly available web-hosted geo-spatial data. View API docs at: https://niwa.github.io/geoapis/
MIT License
9 stars 3 forks source link

Add support for publicly hosted USGS LiDAR #26

Open rosepearson opened 2 years ago

rosepearson commented 2 years ago

The USGS is moving to publically host it's LiDAR data on a public AWS server. Details can be found at: https://registry.opendata.aws/usgs-lidar/

The key information is: location = us-west-2 -> s3.us-west-2.amazonaws.com bucket = usgs-lidar-public

The AWS key appears to be inhte URL to the dataset on open topograpy. For instace the key for https://portal.opentopography.org/usgsDataset?dsid=USGS_LPC_AL_25Co_B3_2017 is USGS_LPC_AL_25Co_B3_2017

The only challenge seems to be tracking down the key of the dataset as this information doesn't seem to be listed in the metadata seek link for an example.

rosepearson commented 2 years ago

A quick example of some code for connecting to the relevant bucket either via boto.client or boto.resource and printing out some of the contained objects.

NETLOC_DATA = "s3.us-west-2.amazonaws.com" 
SCHEME = "https"
aws_endpoint_url = urllib.parse.urlunparse((SCHEME, NETLOC_DATA, "", "", "", ""))

client = boto3.client('s3', endpoint_url=aws_endpoint_url,
                      config=botocore.config.Config(signature_version=botocore.UNSIGNED))

s3 = boto3.resource('s3', endpoint_url=aws_endpoint_url,
                    config=botocore.config.Config(signature_version=botocore.UNSIGNED))

my_bucket = s3.Bucket('usgs-lidar-public')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object)
rosepearson commented 2 years ago

So I've had a look into the contents of the AWS bucket for dataset USGS LPC AL 25Co B3 2017.

I created a boto3 client for interrogating the dataset as:

import urllib, boto3, botocore

NETLOC_DATA = "s3.us-west-2.amazonaws.com" 
SCHEME = "https"
aws_endpoint_url = urllib.parse.urlunparse((SCHEME, NETLOC_DATA, "", "", "", ""))
client = boto3.client('s3', endpoint_url=aws_endpoint_url ,
                      config=botocore.config.Config(signature_version=botocore.UNSIGNED))

And had a look at the folder structure of the bucket using client.list_objects_v2(Bucket='usgs-lidar-public', Prefix='USGS_LPC_AL_25Co_B3_2017/', Delimiter='/'), which returned the 'common prefixes':

[{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-backup/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-data/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-hierarchy/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-sources/'},
  {'Prefix': 'USGS_LPC_AL_25Co_B3_2017/info/'}]

The .laz files appear to be contained in the ept-data folder. The other folders appear to contain .json files which may or may not have information about the naming convention or spatial distribution of the .laz files.

Looking at the contents of a JSON file

An example of the contents of USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json from the `info folder~is show below. image

client.download_file('usgs-lidar-public', 'USGS_LPC_AL_25Co_B3_2017/info/USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json', r"path/to/download/USGS_LPC_AL_25Co_B3_2017/USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json")

Downloading a .laz file

Filtering is only supported by prefix and not file type. First we need to filter the object lists to select only LAZ files.

prefix = 'USGS_LPC_AL_25Co_B3_2017/ept-data/'
file_list = client.list_objects_v2(Bucket='usgs-lidar-public', Prefix=prefix, Delimiter='/')

Download the first file in the returned file list:

download_path = pathlib.Path("/path/to/download/location/")
pathlib.Path(download_path / prefix).mkdir(parents=True, exist_ok=True)

for i in len(file_list['Contents']):
    client.download_file('usgs-lidar-public',file_list['Contents'][i]['Key'],
                                      download_path / file_list['Contents'][i]['Key'])