Open rabernat opened 5 years ago
Some background on cloud storage:
Our bucket is https://console.cloud.google.com/storage/browser/pangeo-pgc?project=pangeo-181919&folder&organizationId=819335046878
Project is pangeo-181919
Command to upload (recusive)
gsutil -m cp -r /path/to/data gs://pangeo-pgc/
@jjspergel the code you sent me for producing the json files is running. It works with only one copy of the satstac.py file sat in the '8m' directory if i use the following
forfiles /s /m *_COG_LZW.tif /c "cmd /c python ../../satstac.py @path @file"
One thing, though is the file structure may not be exactly as I put them up in the bucket. Looking at this example: 09_39_8m_dem_COG_LZW.txt It looks like it is https://storage.googleapis.com/pangeo-pgc/09_39_8m_dem_COG_LZW.json when it should be https://storage.googleapis.com/pangeo-pgc/8m/09_39_8m_dem_COG_LZW.json
i will try to edit the satstac.py file and rerun i think.
Sorry @jkingslake , but yes I think that should be an easy fix - just edit the "link_pref" to include the subfolder "8m". I also would recommend renaming the py file, my compiler got confused later because both the module and the code I wrote are called satstac.
Done. Where should the json files need to sit? I suspect the answer is that it doesnt matter, but just to check before I upload them.
I'm working right now on how we want to combine them into a json catalog. I don't think it really matters, just as long as they can be easily accessed by the code that will combine/index them all.
On Wed, Apr 3, 2019 at 3:37 PM jkingslake notifications@github.com wrote:
Done. Where should the json files need to sit? I suspect the answer is that it doesnt matter, but just to check before I upload them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/2#issuecomment-479629563, or mute the thread https://github.com/notifications/unsubscribe-auth/Ao-5EZFtUJezbkyzdL8QioFMaRMn9FBDks5vdQL1gaJpZM4b8NLw .
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
I would hold off on uploading the .json files. I'm having some issues with combining them. They might need a little bit of tweaking to get the correct syntax and format to work with the code. Sorry for the inconvenience.
On Wed, Apr 3, 2019 at 3:41 PM Julian Jacob Spergel jjs2268@columbia.edu wrote:
I'm working right now on how we want to combine them into a json catalog. I don't think it really matters, just as long as they can be easily accessed by the code that will combine/index them all.
On Wed, Apr 3, 2019 at 3:37 PM jkingslake notifications@github.com wrote:
Done. Where should the json files need to sit? I suspect the answer is that it doesnt matter, but just to check before I upload them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/2#issuecomment-479629563, or mute the thread https://github.com/notifications/unsubscribe-auth/Ao-5EZFtUJezbkyzdL8QioFMaRMn9FBDks5vdQL1gaJpZM4b8NLw .
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
Glad to see stuff happening without me!
Just want to remind you that you can use this git repo to share the scripts you are using to generate the data, produce stac, etc (rather than emailing python files around).
It seems like the satstac program can only catalog these json files if there's no "link" section of the .json file already, so we'll need to remove that from the json code and remake all the .json files. I'm still working on getting them to catalog/combine properly.
The issue right now is that the automatically generated links look like: 'https://storage.googleapis.com/pangeo-pgc/8m\\stac-catalog\\34_39_8m_dem_COG_LZW.json'
which I think will create problems when uploaded. I'll keep you updated.
Hi Jonny,
I forget how long it took to make .json files last time, but if it's a long time, would it be possible to run the new code overnight tonight? The only change is the removal of the link-forming section. As before use:
forfiles /s /m *_COG_LZW.tif /c "cmd /c python ../../satstac_jsoncode.py @path https://github.com/path @file https://github.com/file
Let me know if you can run this tonight or not. No real rush, I just want to schedule work this week. Thanks!
On Wed, Apr 3, 2019 at 3:37 PM jkingslake notifications@github.com wrote:
Done. Where should the json files need to sit? I suspect the answer is that it doesnt matter, but just to check before I upload them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/2#issuecomment-479629563, or mute the thread https://github.com/notifications/unsubscribe-auth/Ao-5EZFtUJezbkyzdL8QioFMaRMn9FBDks5vdQL1gaJpZM4b8NLw .
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
def satstac(rasterPath,rasterName):
from satstac import Catalog, Collection, Item
import xarray as xr
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import rasterio
import rasterio.features
import rasterio.warp
from shapely import geometry
import json
import cartopy
class REMAStacItem:
def __init__(self,rasterPath, rasterName):
self.rasterPath = rasterPath
self.id = rasterName
self.stac_item = {"id": rasterName[:-4],
"rownum": str(rasterName)[0:2],
"colnum": str(rasterName)[3:5],
"type": "Feature",
"bbox": self.calcBBox(),
"geometry": self.calcGeometry(),
"units": 'meters',
"resolution": 8
}
self.stac_item['properties']=self.createProperties_meta()
self.stac_item['links'] = self.create_links()
def calcBBox(self):
with rasterio.open(self.rasterPath) as dataset:
bbox = rasterio.warp.transform_bounds(dataset.crs, 'EPSG:4326',dataset.bounds.left,dataset.bounds.bottom,
dataset.bounds.right,dataset.bounds.top)
#self.bbox=json.loads(json.dumps(geometry.mapping(bbox)))
return json.loads(json.dumps(bbox))
def calcGeometry(self):
with rasterio.open(self.rasterPath) as dataset:
profile = dataset.profile
# Read the dataset's valid data mask as a ndarray.
geom = geometry.box(*rasterio.warp.transform_bounds(dataset.crs, 'EPSG:4326',dataset.bounds.left,dataset.bounds.bottom,
dataset.bounds.right,dataset.bounds.top))
self.geometry=json.loads(json.dumps(geometry.mapping(geom)))
return json.loads(json.dumps(geometry.mapping(geom)))
#def createAssetList(self):
# pass
def createProperties_meta(self):
meta_prop_dict = {}
with rasterio.open(self.id) as dataset:
meta_prop_dict = dataset.meta
crs = meta_prop_dict['crs']
crs = 'EPSG:'+str(crs.to_epsg())
meta_prop_dict['crs'] = crs
profile = dataset.profile
return meta_prop_dict
# if imdPath:
# o = urlparse(imdPath)
# if o.scheme == 's3':
# s3 = boto3.resource("s3")
# bucket = o.netloc
# key = o.path.lstrip('/')
# obj = s3.Object(bucket, key)
# body = obj.get()['Body'].read()
# root = ET.fromstring(body)
#else:
# tree = ET.parse(imdPath)
# root = tree.getroot()
#self.metaDataStruct = iterate_OverXML(root, recursionTagList=['IMD', 'IMAGE'])
#elif vrtPath:
# with rasterio.open(vrtPath) as src:
# self.metaDataStruct = process_nitf_tags(src.tags())
# else:
# print("no Data Specified")
#self.eoDict = self.processMetaData_To_Properties(self.metaDataStruct, self.provider, self.license)
#return json.loads(json.dumps(meta_prop_dict))
def write_toJSON(self, filename):
with open(filename, 'w') as fp:
json.dump(self.stac_item, fp, indent=2)
myStacItem = REMAStacItem(rasterPath,rasterName)
filename = myStacItem.stac_item['id']+'.json'
myStacItem.write_toJSON(filename)
Julian, Yes, I can do this. Although you will have to remind me the details. Jonny
On Mon, Jun 10, 2019 at 3:56 PM jjspergel notifications@github.com wrote:
Hi Jonny,
I forget how long it took to make .json files last time, but if it's a long time, would it be possible to run the new code overnight tonight? The only change is the removal of the link-forming section. As before use:
forfiles /s /m *_COG_LZW.tif /c "cmd /c python ../../satstac_jsoncode.py @path https://github.com/path @file https://github.com/file
Let me know if you can run this tonight or not. No real rush, I just want to schedule work this week. Thanks!
On Wed, Apr 3, 2019 at 3:37 PM jkingslake notifications@github.com wrote:
Done. Where should the json files need to sit? I suspect the answer is that it doesnt matter, but just to check before I upload them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/rabernat/pangeo-rema/issues/2#issuecomment-479629563 , or mute the thread < https://github.com/notifications/unsubscribe-auth/Ao-5EZFtUJezbkyzdL8QioFMaRMn9FBDks5vdQL1gaJpZM4b8NLw
.
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
def satstac(rasterPath,rasterName):
from satstac import Catalog, Collection, Item import xarray as xr import pandas as pd import numpy as np from matplotlib import pyplot as plt import rasterio import rasterio.features import rasterio.warp from shapely import geometry import json import cartopy
class REMAStacItem:
def init(self,rasterPath, rasterName):
self.rasterPath = rasterPath self.id = rasterName self.stac_item = {"id": rasterName[:-4], "rownum": str(rasterName)[0:2], "colnum": str(rasterName)[3:5], "type": "Feature", "bbox": self.calcBBox(), "geometry": self.calcGeometry(), "units": 'meters', "resolution": 8 }
self.stac_item['properties']=self.createProperties_meta() self.stac_item['links'] = self.create_links()
def calcBBox(self):
with rasterio.open(self.rasterPath) as dataset: bbox = rasterio.warp.transform_bounds(dataset.crs, 'EPSG:4326',dataset.bounds.left,dataset.bounds.bottom, dataset.bounds.right,dataset.bounds.top)
self.bbox=json.loads(json.dumps(geometry.mapping(bbox)))
return json.loads(json.dumps(bbox))
def calcGeometry(self):
with rasterio.open(self.rasterPath) as dataset: profile = dataset.profile
Read the dataset's valid data mask as a ndarray.
geom = geometry.box(*rasterio.warp.transform_bounds(dataset.crs, 'EPSG:4326',dataset.bounds.left,dataset.bounds.bottom, dataset.bounds.right,dataset.bounds.top))
self.geometry=json.loads(json.dumps(geometry.mapping(geom)))
return json.loads(json.dumps(geometry.mapping(geom)))
def createAssetList(self):
pass
def createProperties_meta(self): meta_prop_dict = {} with rasterio.open(self.id) as dataset: meta_prop_dict = dataset.meta crs = meta_prop_dict['crs'] crs = 'EPSG:'+str(crs.to_epsg()) meta_prop_dict['crs'] = crs profile = dataset.profile return meta_prop_dict
if imdPath:
o = urlparse(imdPath)
if o.scheme == 's3':
s3 = boto3.resource("s3")
bucket = o.netloc
key = o.path.lstrip('/')
obj = s3.Object(bucket, key)
body = obj.get()['Body'].read()
root = ET.fromstring(body)
else:
tree = ET.parse(imdPath)
root = tree.getroot()
self.metaDataStruct = iterate_OverXML(root, recursionTagList=['IMD',
'IMAGE'])
elif vrtPath:
with rasterio.open(vrtPath) as src:
self.metaDataStruct = process_nitf_tags(src.tags())
else:
print("no Data Specified")
self.eoDict = self.processMetaData_To_Properties(self.metaDataStruct,
self.provider, self.license)
return json.loads(json.dumps(meta_prop_dict))
def write_toJSON(self, filename): with open(filename, 'w') as fp: json.dump(self.stac_item, fp, indent=2)
myStacItem = REMAStacItem(rasterPath,rasterName) filename = myStacItem.stac_item['id']+'.json' myStacItem.write_toJSON(filename)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/2?email_source=notifications&email_token=ALTXJ3L6SAINUDNEUULRPXTPZ2WVLA5CNFSM4G7Q2LYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXLBUHA#issuecomment-500570652, or mute the thread https://github.com/notifications/unsubscribe-auth/ALTXJ3IQ7R64NPJNBEGBGDLPZ2WVLANCNFSM4G7Q2LYA .
--
Jonathan Kingslake
Assistant Professor
Dept. Earth and Environmental Sciences,
Lamont-Doherty Earth Observatory of Columbia University
61 Rt. 9W, Palisades, NY 10964-8000
jkingslake.com
@jkingslake
It's been a little while for me too. I believe you just need to move the python code I sent to the folder with the REMA tiles in it, run a terminal with the directory in that same folder with the REMA files, and then run the line of code I sent. If it doesn't work, maybe we can try it tomorrow together.
Julian
On Mon, Jun 10, 2019 at 4:16 PM jkingslake notifications@github.com wrote:
Julian, Yes, I can do this. Although you will have to remind me the details. Jonny
On Mon, Jun 10, 2019 at 3:56 PM jjspergel notifications@github.com wrote:
Hi Jonny,
I forget how long it took to make .json files last time, but if it's a long time, would it be possible to run the new code overnight tonight? The only change is the removal of the link-forming section. As before use:
forfiles /s /m *_COG_LZW.tif /c "cmd /c python ../../satstac_jsoncode.py @path https://github.com/path @file https://github.com/file
Let me know if you can run this tonight or not. No real rush, I just want to schedule work this week. Thanks!
On Wed, Apr 3, 2019 at 3:37 PM jkingslake notifications@github.com wrote:
Done. Where should the json files need to sit? I suspect the answer is that it doesnt matter, but just to check before I upload them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/rabernat/pangeo-rema/issues/2#issuecomment-479629563 , or mute the thread <
.
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
def satstac(rasterPath,rasterName):
from satstac import Catalog, Collection, Item import xarray as xr import pandas as pd import numpy as np from matplotlib import pyplot as plt import rasterio import rasterio.features import rasterio.warp from shapely import geometry import json import cartopy
class REMAStacItem:
def init(self,rasterPath, rasterName):
self.rasterPath = rasterPath self.id = rasterName self.stac_item = {"id": rasterName[:-4], "rownum": str(rasterName)[0:2], "colnum": str(rasterName)[3:5], "type": "Feature", "bbox": self.calcBBox(), "geometry": self.calcGeometry(), "units": 'meters', "resolution": 8 }
self.stac_item['properties']=self.createProperties_meta() self.stac_item['links'] = self.create_links()
def calcBBox(self):
with rasterio.open(self.rasterPath) as dataset: bbox = rasterio.warp.transform_bounds(dataset.crs, 'EPSG:4326',dataset.bounds.left,dataset.bounds.bottom, dataset.bounds.right,dataset.bounds.top)
self.bbox=json.loads(json.dumps(geometry.mapping(bbox)))
return json.loads(json.dumps(bbox))
def calcGeometry(self):
with rasterio.open(self.rasterPath) as dataset: profile = dataset.profile
Read the dataset's valid data mask as a ndarray.
geom = geometry.box(*rasterio.warp.transform_bounds(dataset.crs, 'EPSG:4326',dataset.bounds.left,dataset.bounds.bottom, dataset.bounds.right,dataset.bounds.top))
self.geometry=json.loads(json.dumps(geometry.mapping(geom)))
return json.loads(json.dumps(geometry.mapping(geom)))
def createAssetList(self):
pass
def createProperties_meta(self): meta_prop_dict = {} with rasterio.open(self.id) as dataset: meta_prop_dict = dataset.meta crs = meta_prop_dict['crs'] crs = 'EPSG:'+str(crs.to_epsg()) meta_prop_dict['crs'] = crs profile = dataset.profile return meta_prop_dict
if imdPath:
o = urlparse(imdPath)
if o.scheme == 's3':
s3 = boto3.resource("s3")
bucket = o.netloc
key = o.path.lstrip('/')
obj = s3.Object(bucket, key)
body = obj.get()['Body'].read()
root = ET.fromstring(body)
else:
tree = ET.parse(imdPath)
root = tree.getroot()
self.metaDataStruct = iterate_OverXML(root, recursionTagList=['IMD',
'IMAGE'])
elif vrtPath:
with rasterio.open(vrtPath) as src:
self.metaDataStruct = process_nitf_tags(src.tags())
else:
print("no Data Specified")
self.eoDict = self.processMetaData_To_Properties(self.metaDataStruct,
self.provider, self.license)
return json.loads(json.dumps(meta_prop_dict))
def write_toJSON(self, filename): with open(filename, 'w') as fp: json.dump(self.stac_item, fp, indent=2)
myStacItem = REMAStacItem(rasterPath,rasterName) filename = myStacItem.stac_item['id']+'.json' myStacItem.write_toJSON(filename)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/rabernat/pangeo-rema/issues/2?email_source=notifications&email_token=ALTXJ3L6SAINUDNEUULRPXTPZ2WVLA5CNFSM4G7Q2LYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXLBUHA#issuecomment-500570652 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ALTXJ3IQ7R64NPJNBEGBGDLPZ2WVLANCNFSM4G7Q2LYA
.
--
Jonathan Kingslake
Assistant Professor
Dept. Earth and Environmental Sciences,
Lamont-Doherty Earth Observatory of Columbia University
61 Rt. 9W, Palisades, NY 10964-8000
jkingslake.com
@jkingslake
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rabernat/pangeo-rema/issues/2?email_source=notifications&email_token=AKH3SENWAB5ZU7PMGL3E4VTPZ2Y7VA5CNFSM4G7Q2LYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXLDJOI#issuecomment-500577465, or mute the thread https://github.com/notifications/unsubscribe-auth/AKH3SENV5BYDTCRAQMJUUK3PZ2Y7VANCNFSM4G7Q2LYA .
-- Julian J. Spergel Lamont Doherty Earth Observatory University of Chicago '16 Geophysical Sciences
The latest code for creating json files is here
I have code to collect the json files that works. Let's meet before implementing it so I can make sure that we have a file structure and metadata that we agree on. then we can upload it all to the Google Bucket! `` satstac_collection.txt
Once the data has been converted to COG (see #1), it needs to be uploaded to the cloud and cataloged. The uploading itself is easy. The cataloging is the hard part/
Probably the best way to do the catalog is with the spatiotemporal asset catalog specification: https://github.com/radiantearth/stac-spec/
This basically means creating a bunch of .json files which describe the data collection. There are various automated tools and scripts to generate these: https://github.com/radiantearth/stac-spec/blob/master/implementations.md#ecosystem