Open rosepearson opened 2 years ago
A quick example of some code for connecting to the relevant bucket either via boto.client or boto.resource and printing out some of the contained objects.
NETLOC_DATA = "s3.us-west-2.amazonaws.com"
SCHEME = "https"
aws_endpoint_url = urllib.parse.urlunparse((SCHEME, NETLOC_DATA, "", "", "", ""))
client = boto3.client('s3', endpoint_url=aws_endpoint_url,
config=botocore.config.Config(signature_version=botocore.UNSIGNED))
s3 = boto3.resource('s3', endpoint_url=aws_endpoint_url,
config=botocore.config.Config(signature_version=botocore.UNSIGNED))
my_bucket = s3.Bucket('usgs-lidar-public')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object)
So I've had a look into the contents of the AWS bucket for dataset USGS LPC AL 25Co B3 2017.
I created a boto3 client for interrogating the dataset as:
import urllib, boto3, botocore
NETLOC_DATA = "s3.us-west-2.amazonaws.com"
SCHEME = "https"
aws_endpoint_url = urllib.parse.urlunparse((SCHEME, NETLOC_DATA, "", "", "", ""))
client = boto3.client('s3', endpoint_url=aws_endpoint_url ,
config=botocore.config.Config(signature_version=botocore.UNSIGNED))
And had a look at the folder structure of the bucket using client.list_objects_v2(Bucket='usgs-lidar-public', Prefix='USGS_LPC_AL_25Co_B3_2017/', Delimiter='/')
, which returned the 'common prefixes':
[{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-backup/'},
{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-data/'},
{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-hierarchy/'},
{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/ept-sources/'},
{'Prefix': 'USGS_LPC_AL_25Co_B3_2017/info/'}]
The .laz
files appear to be contained in the ept-data
folder. The other folders appear to contain .json
files which may or may not have information about the naming convention or spatial distribution of the .laz
files.
An example of the contents of USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json
from the `info folder~is show below.
client.download_file('usgs-lidar-public', 'USGS_LPC_AL_25Co_B3_2017/info/USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json', r"path/to/download/USGS_LPC_AL_25Co_B3_2017/USGS_LPC_AL_25Co_B3_2017_16S_FB_3393.json")
Filtering is only supported by prefix and not file type. First we need to filter the object lists to select only LAZ files.
prefix = 'USGS_LPC_AL_25Co_B3_2017/ept-data/'
file_list = client.list_objects_v2(Bucket='usgs-lidar-public', Prefix=prefix, Delimiter='/')
Download the first file in the returned file list:
download_path = pathlib.Path("/path/to/download/location/")
pathlib.Path(download_path / prefix).mkdir(parents=True, exist_ok=True)
for i in len(file_list['Contents']):
client.download_file('usgs-lidar-public',file_list['Contents'][i]['Key'],
download_path / file_list['Contents'][i]['Key'])
The USGS is moving to publically host it's LiDAR data on a public AWS server. Details can be found at: https://registry.opendata.aws/usgs-lidar/
The key information is: location = us-west-2 -> s3.us-west-2.amazonaws.com bucket = usgs-lidar-public
The AWS key appears to be inhte URL to the dataset on open topograpy. For instace the key for https://portal.opentopography.org/usgsDataset?dsid=USGS_LPC_AL_25Co_B3_2017 is USGS_LPC_AL_25Co_B3_2017
The only challenge seems to be tracking down the key of the dataset as this information doesn't seem to be listed in the metadata seek link for an example.