nsidc / granule-metgen

Metadata generator for direct-to-Cumulus era
Other
0 stars 0 forks source link

Forge-py research #5

Closed juliacollins closed 1 month ago

juliacollins commented 2 months ago

Add links to forge-py evaluation, placeholder area in software architecture diagram list, any other pre-project activities.

Relevant repository is https://github.com/nsidc/forge-py, branch is granule-metgen-investigation

juliacollins commented 2 months ago

forge-py was forked from PO.DAAC to https://github.com/nsidc/forge-py. Branch granule-metgen-investigation includes code changes and extra notes in the README. Follow-on issue https://github.com/nsidc/granule-metgen/issues/14 deals with fine-tuning of spatial information.

juliacollins commented 2 months ago

A placeholder now exists in the NSIDC Software Architecture Diagrams collection. See https://nsidc.atlassian.net/wiki/x/VYAWFg. Still need to add diagrams!

juliacollins commented 1 month ago

Software architecture page populated with diagrams presented at stakeholder meeting. We can/should update these as the project goes on.

lisakaser commented 1 month ago

@juliacollins our original statement of the forge-py research phase in the research project stakeholder meeting notes was stated as "Confirm usability of forge-py as a replacement for MetGen spatial handling. Ensure spatial handling can be executed independent of the rest of the ingest process." This is what I would have expected to see addressed in this issue. Can you summarize your findings on this here or point me to an other issue that addresses that?

juliacollins commented 1 month ago

forge-py as-is can't be used as a drop-in tool, since it assumes all spatial information is read from a netCDF data file, and also generates a convex hull rather than a concave one. However, it does provide a reference to other packages that we can use for spatial refinements, and I was able to hack some of the configuration values to generate a spatial output that roughly mimicked our current approach. Based on my experiments, I recommend we develop our own standalone command-line tool that accepts spatial information in a standalone input file, similar to the spatial input MetGen currently requires. This bit of development should be small (e.g. a sprint or less for a developer) since we're not starting from scratch. This code should be something that can either run on local infrastructure or wrapped in a lambda for use in a Cumulus pipeline.

Before we embark on any custom code, I do recommend reaching out to PO.DAAC to confirm whether or not they're interested in some of the changes we need. If so, then we could consider modifying their code base rather than creating one of our own. My experiments were quick-and-dirty and not elegant with respect to the existing code, and the logic would need some cleanup and better error handling.

Experiment notes are in the README in the branch granule-metgen-investigation for forge-py (https://github.com/nsidc/forge-py). This repository/branch can serve as the starting point for our "spatial handling" work.

lisakaser commented 1 month ago

@juliacollins thanks for the summary. I created a follow up issue (#22 ) to contact PO.DAAC.