Open mike-gangl opened 3 months ago
catalog.131275543.json
dapa_collection = UnityCollectionStac() \
.with_id(temp_collection_id) \
.with_graule_id_regex("^abcd.1234.efgh.test_file.*$") \
.with_granule_id_extraction_regex("(^abcd.1234.efgh.test_file.*)(\\.data\\.stac\\.json|\\.nc\\.cas|\\.cmr\\.xml)") \
.with_title(f"{self.granule_id}.data.stac.json") \
.with_process('stac') \
.with_provider('unity') \
.add_file_type(f"{self.granule_id}.data.stac.json", "^abcd.1234.efgh.test_file.*\\.data.stac.json$", 'unknown_bucket', 'application/json', 'root') \
.add_file_type(f"{self.granule_id}.nc", "^abcd.1234.efgh.test_file.*\\.nc$", 'protected', 'data', 'item') \
.add_file_type(f"{self.granule_id}.nc.cas", "^abcd.1234.efgh.test_file.*\\.nc.cas$", 'protected', 'metadata', 'item') \
.add_file_type(f"{self.granule_id}.nc.cmr.xml", "^abcd.1234.efgh.test_file.*\\.nc.cmr.xml$", 'protected', 'metadata', 'item') \
.add_file_type(f"{self.granule_id}.nc.stac.json", "^abcd.1234.efgh.test_file.*\\.nc.stac.json$", 'protected', 'metadata', 'item')
Use Case: App Pack Gen no longer has to call catalog after every stage out call
Is there logging at each step to message errors back up to the mission Operator?
@rtapella that's a concern for sure. what will alert users to an error in the catalogging if it happens? We have some options here, but i wonder what it should look like.
i think the "archive" service is very similar, to be honest. Cumulus has a dashboard with this information- but it's not what we'd want to expose to the other users, i don't think.
I think it should get pushed up to the Airflow logs as part of processing.
"successful_features.json is uploaded to S3 Bucket"... where does failed_features.json go @wphyojpl ?
@rtapella
I think existing logic still remains. It will be stored locally.. It is up to the user to fix the problem, and upload them again..
"locally" meaning what? in the unity ds wherever stage-out writes to?
Yea.. it is stored in the server that the stage-out script is run.
Is there some sort of error message associated with each failed item?
Yes. the exception messages are added so that the user can fix them.
Automated Data Cataloging
In order to make the system more user friendly, we should allow for data cataloging when the results of a stage-out operation (successful_features.json) file is stored in S3. this is an optional parameter (defaults to enabled) created during the s3 bucket deployment from marketplace (e.g. "disable auto-cataloging of files").
This is to remove the need for a user being forced to add a 'catalog' task in a CWL workflow in order to persist files in the unity catalog. Doing this will do 3 things:
Acceptance Criteria
Acceptance criteria required to implement the epic
Why: Separate data catalog logic from SPS, Workflows. These are specific to "Unity" and wouldn't be run outside of the Unity Context.
Work Tickets
Link to work tickets required to implement the epic
Dependencies
Other epics or outside tickets required for this to work
Associated Risks
links to risk issues associated with this epic