Align configuration and spatial handling workflow with data ingest team workflow

juliacollins commented 2 months ago

Questions posted in Slack, June 27, 2024:

Julia Collins 14:45Another question that someone in this crowd may be able to answer: There is an NSIDC fork of cumulus-metadata-aggregator, which apparently is in use in this in-between time. Are there plans to use cumulus-metadata-aggregator in the “direct to Cumulus” era? I ask because the metadata aggregator handles footprint (spatial metadata) updates. As I understand it, those updates happen after UMM-G file generation/CNM posting. That’s a different workflow than we’ve used in the past (MetGen handles spatial refinements), and the workflow I proposed for the post-MetGen era assumed all spatial refinement will happen as part of the UMM-G file generation, not as part of the data ingest step. Please straighten me out if I’m adding work to our UMM-G pipeline that doesn’t need to be there. It would be helpful for me to have access to a schematic that shows both the workflow to create UMM-G (Figure 2 at https://nsidc.atlassian.net/wiki/spaces/DAACSW/pages/320405515/Cumulus+Direct+Ingest+M[…]and+Transfer+Files+Definition+Project+-+Stakeholder+meeting) and what is planned to happen as part of the data ingest process. (edited)

Julia Collins 14:52 A specific example: a fix was applied to NSIDC’s fork to address the issue Shift polygon away from antimeridan for orientation detection. (https://github.com/nsidc/cumulus-metadata-aggregator/commit/fb0635efa51bbb5d0a332acb6d64b0a60d9a9fd9 in github if you have access). This is the sort of spatial work I would expect to be happening in the UMM-G-generation workflow. But, if the UMM-G pipeline can shove spatial information into the metadata file without worrying about cleaning it up, that simplifies metadata generation! If we’re going to do that, then maybe we should move the footprint generation out of the UMM-G pipeline, too. I believe cumulus-metadata-aggregator was designed to take forge output (the footprint) and do something with it. However, I’ve been looking at building off of the forge-py code base to handle footprint generation (i.e. replacing the fancy stuff currently done by MetGen) as part of the UMM-G pipeline. I don’t want to duplicate effort. (edited)

Julia Collins 14:58 :point_up::skin-tone-3:All of which is to say, if we plan to stick with PO.DAAC’s approach to metadata enhancement, that changes the requirements for what we need to do to come up with a “valid” UMM-G in the first place. Might as well do that “sock” generation in the same pipeline that executes cumulus-metadata-aggregator. There’s no need to impose our current (ECS, MetGen-centric) workflow onto the Cumulus environment. (edited)

lisakaser commented 2 months ago

Is there a visualization of the data ingest workflow and collection level Cumulus setup - is there any overlap with the file level metadata generation tool that could be leveraged.

lisakaser commented 1 month ago

After discussion with Troy and DUCk team this has a very low priority for Ops and with other scope increases that this project has to deal with we decided to keep this out of scope. At an absolute maximum DUCk will communicate what we collected that could be useful for collection level setups but keep it out of scope for this project.

nsidc / granule-metgen

Align configuration and spatial handling workflow with data ingest team workflow #3