zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

Request for Information: CEOS WGISS #54 GeoZarr Presentation #11

Closed christophenoel closed 1 year ago

christophenoel commented 1 year ago

Hi @briannapagan & team,

I have been invited by ESA to deliver a presentation on our GeoZarr working group's activities during the upcoming CEOS WGISS #54 meeting on April 19th, 2023. CEOS is responsible for coordinating international civil space-based Earth observation programs (learn more at https://ceos.org/about-ceos/overview/).

To create an informative and engaging presentation, I require your input on several topics (which I'm requested to cover), including:

While some of these topics may necessitate further discussion, your initial feedback is invaluable. I appreciate your prompt response and support.

Best regards,

christophenoel commented 1 year ago

My summary of the major use cases reported during our discussions:

  1. Easy compatibility with popular mapping and data analysis tools like GDAL, Xarray, ArcGIS, and QGIS.
  2. Combining different types of geospatial data, like satellite images, elevation maps, and weather models.
  3. Creating and displaying geospatial data in web browsers without complex workarounds
  4. Helping users discover, access, and retrieve the data they need, including subsets or different arrangements of the data
  5. Supporting advanced geospatial features for more accurate data representation and analysis.
  6. Allowing scientists and researchers to work with diverse data types and projections in their preferred software and programming languages.
christophenoel commented 1 year ago

I took the liberty of elaborating further on HackMD about yesterday's discussion (see https://hackmd.io/t2DWpX1iQEWMKx1Fi4Px7A?view ).

I didn't fully grasp the conclusions regarding the SWG process. Regardless of our goals (proposing CF improvements, enhancing guidance for encoding geospatial data in Zarr, leveraging Zarr V3), what is our intention concerning the GeoZarr SWG?

I am convinced that we should immediately initiate the process (Scott has offered to start drafting the charter, and I would like to dedicate my efforts to this task).

The SWG will provide us with the necessary framework: resources (email, document sharing, meetings), the experience of numerous experts accustomed to facilitating projects similar to ours and facilitating contacts/agreement by external actors, technical experts who can devote time to the project, and the essential openness for acceptance by actors in the geospatial domain. Developing recommendations on our own is the opposite approach to a standardization process. Furthermore, an SWG can fully support collaboration on conventions or improvements to external standards (such as CF), and the scope is entirely defined by the charter.

Could you please clarify your intentions following yersterday discussion ? Thank you.

rabernat commented 1 year ago

I am in favor of moving forward with the SWG.

The initial thought within our group, advocated by people like Chris Holmes, Howard Butler, Sean Gillies, and Even Rouault, was that the OGC process was basically too slow and bureaucratic. In Sean's words:

If you want GeoZarr to take years and be over-designed, the OGC is the place to be!

The hope was that that, by operating independently, we could move more quickly and agilely. The intention was that, once we had a mostly working spec and implementation, then we could bring this to the OGC for ratification. This is basically how COG and STAC were both developed.

However, things haven't turned out that way. Instead, we gone back and forth discussing many different and interesting points, but have failed to create a clear roadmap for resolving the issues that have arisen in our discussion. My impressions its that this has happened because most of the people involved in this effort are not really putting in significant time and work outside of our bi-weekly meetings. There has been plenty of critique of @christophenoel's original draft, but no one has put forward any specific alternatives and, crucially, none of the implementers (e.g. GDAL, QGIS, etc.) seem to be at the table.

Given the situation, I am now convinced that working within the OGC framework is probably better than our current trajectory. So I am 👍 on moving forward with the SWG charter.

But I'm very curious to hear opinions from @matthewhanson and @dblodgett-usgs, who have lots of experience with OGC.

rabernat commented 1 year ago

Here is my input

  • Official Team Name: I currently assume it is the GeoZarr Standards Working Group. Please confirm or correct this ~(Steering is taken from ODC usual terminology)~.

I am :+1: on GeoZarr SWG

  • ZEP Process: Is our objective to adhere to the ZEP process confirmed? I'm uncertain about this aspect.

We will publish a Zarr Convention based on the outcome of the SWG, rather than a ZEP. This is more lightweight and doesn't have to be approved by anyone else.

  • OGC Specification: Can we confirm our intention to become an OGC specification and establish a new OGC SWG? My understanding is affirmative. What are we waiting for working on the charter ?

:+1:, see response above.

  • Major Use Cases: If possible, provide brief one-sentence descriptions of our key use cases, specifying the related domain and general intent.

I think your use cases summarized above (https://github.com/zarr-developers/geozarr-spec/issues/11#issuecomment-1488400405) are a great start. I would also add

  • Current Roadmap: Could we agree on the next steps, such as initiating work on the OGC charter? What outputs should we anticipate and by when?

We need to do more work to define milestones. This will give our work more concreteness.

  • Accomplishments to Date: Please outline our achievements so far, such as the small store example, defined use cases, etc.

Mostly we have enumerated problems with the overall geospatial raster data ecosystem, focusing on differences between GeoTIFF and NetCDF. These are not particularly Zarr specific. I'm going to take some time to write down my synthesis of these issues.

  • GitHub Issues: Are there any noteworthy issues raised on GitHub that we should highlight in the presentation?

Over the next week, I will add a few issues that summarize what I think are the key points that need to be resolved to move forward.


Thanks @christophenoel for working to push this forward. I appreciate your patience. I understand you may be frustrated by how things have evolved. Let's continue to try to work together to advanced this standard.

christophenoel commented 1 year ago

@rabernat : Thank you it's a very helpful summary.

I recommend clearly articulating the "agile" mindset in the charter if there are concerns about it. In my perspective, with sufficient dedication, documents, specifications, and other resources can adapt rapidly based on the chair's determination to maintain a steady release cadence. For example, significant progress has been made on projects such as OGC API Features, OGC API Routes, OGC API Records, among others.

@rabernat No worries. I'm not frustrated, but trying to push in the original direction as really expected by ESA / OGC who initiated the team. I really appreciate our team spirit.

dblodgett-usgs commented 1 year ago

Thanks for articulating all this @christophenoel and @rabernat --

I am a bit confused on one point -- the thread above seems to indicate that we will be following the Zarr Enhancement Process (ZEP) and using nomenclature from the Open Data Charter (ODC) lexicon (Steering Working Group - SWG) but also initiating an Open Geospatial Consortium Standards Working Group to draft and process comments on an OGC Implementation Standard?

I'm not aware of a precedent for processing a standard via two governance structures simultaneously and the premise of a third (CF) makes this even more daunting.

@rabernat:

However, things haven't turned out that way. Instead, we gone back and forth discussing many different and interesting points, but have failed to create a clear roadmap for resolving the issues that have arisen in our discussion.

I have the same impression of the issue but I have a bit different take on the solution. We're faced with a daunting mix of communities, processes, and pathways for development of a normative artifact (a standard/convention/blah). Faced with this ambiguity, it's been nearly impossible to discuss goals within a frame of reference that allows us to agree on anything. Defining that frame of reference precisely should be our priority here and is NOT trivial -- so the fact that it's taken some time is really not surprising.

I think my preferred path (which I am still thinking through) is to initiate a ZARR-centric process (as we have done here) with a stated objective of publishing the zarr convention we come up with as an OGC Community Standard (that is, NOT as an OGC Implementation Standard). On this path, we would build out a zarr convention based on @christophenoel's start. It would include the ability to encode geotiff metadata in a NetCDF style, have explicit (friendly to data consumers) support for multiband data, and support tiled views of data per COG precedent.

As a design paradigm, the convention should be compatible with CF such that contributing it to CF is not impossible (UGRID as the analogy). The convention should also build from existing OGC baseline (NetCDF and GeoTIFF) as much as possible, avoiding any major conflict with conformance classes of relevant / related standards.

Once a V0. version of the convention is serially complete, it could be proposed to CF and either adopted there as part of that convention or left as an extension of CF -- decision pending the actual nature of the convention. Once a V1. is vetted and implemented by a few software libraries, it could be proposed as an OGC Community Standard.

My logic here is that the basis on NetCDF-CF is sufficiently strong that the CF specification process needs to be considered prior to processing comments under the OGC process. Developing an OGC Implementation Standard leads to a set of decisions that would ostensibly fork CF and iterating on versions of standards between the CF community process and the OGC community process sounds... messy.

OK, enough for now --- long story short, I think we are on the right track and a precisely written charter is the next step. Big question is, are we developing an OGC Implementation Standard as and OGC SWG or are we developing a ZARR Convention as a ZARR SWG?

christophenoel commented 1 year ago

I am a bit confused on one point -- the thread above seems to indicate that we will be following the Zarr Enhancement Process (ZEP) and using nomenclature from the Open Data Charter (ODC) lexicon (Steering Working Group - SWG) but also initiating an Open Geospatial Consortium Standards Working Group to draft and process comments on an OGC Implementation Standard?

It was only some questions. I think we all agree this won't be a ZEP in any case. The mention of ODC was only to highlight justification from Brianna steering (which sounds strange to me) is just a term she used because she's involved in the ODC sphere.

rabernat commented 1 year ago

the thread above seems to indicate that we will be following the Zarr Enhancement Process (ZEP)

Just to clarify this one point--we will not do a ZEP, because our standard will be a Zarr convention, not an extension. So there will only be one governance process / vote / approval required: OGC.

christophenoel commented 1 year ago

OK, enough for now --- long story short, I think we are on the right track and a precisely written charter is the next step. Big question is, are we developing an OGC Implementation Standard as and OGC SWG or are we developing a ZARR Convention as a ZARR SWG?

I don't know what you refer to be a Zarr SWG ? (and I thought we all agreed with Ryan yesterday we don't need the frame of ZEP for what we plan). We also currently discarded the plan of on an OGC Implementation Standard (errat: we discarded plan of a specification).

I think the plan was stated as:

The question is do we agree put the scope of those activities in the frame of a SWG ?

briannapagan commented 1 year ago

I think the plan was stated as:

  • Update existing zarr community guidance about how to store geospatial data in zarr
  • In parallel, improve CF conventions
  • Leverage zarr v3 for our work

The question is do we agree put the scope of those activities in the frame of a SWG ?

I believe I took the name SWG from the OGC process.

Regardless, I think the first and third points above should be scoped in the frame of a SWG. The second point of improving CF conventions I think is outside of the immediate SWG scope, but we are all very invested to pursue that. Ultimately if CF conventions are updated, so would the GeoZarr convention. We focus now on the official OGC stamp on this GeoZarr convention.

If that is clear, I will reach out to Scott tomorrow to inquire about starting the process unless I hear otherwise.

christophenoel commented 1 year ago

I fully agree that plan, very good.

briannapagan commented 1 year ago

For item: In parallel, improve CF conventions I would like to start organizing folks on the NASA side for getting support around geotiff metadata into CF. We have people heavily involved in CF so I just want to find the correct people to talk to.

I asked yesterday but just checking again - is there any open issue I can bring that discussion to? @dblodgett-usgs

dblodgett-usgs commented 1 year ago

@briannapagan -- I'm am doing background on the issue and will work on it as quickly as is possible given email round trips.

I had the sense that we were talking about a "Steering Working Group" in the context of developing a Zarr Convention and a "Standards Working Group" in the context of developing an OGC standard. If SWG is only referring to OGC above, then read on.

One concern here --- I don't think an OGC Standards Working Group is needed or really appropriate if we are shooting for an OGC Community Standard.

https://www.ogc.org/standards/community/ (includes ZARR v2) https://www.ogc.org/standards/ (implementation standards includes things like NetCDF)

A charter for an OGC process would normally lead to an Implementation Standard and requires that we go through an OGC process to develop the specification. If we intend to first develop a Zarr convention -- an OGC charter for a Standards Working Group doesn't seem necessary until we've established the convention that will be put forward as a community standard.

christophenoel commented 1 year ago

To address the concerns raised, it is possible to achieve the task of writing a ZEP-4 recommendation document in the context of an OGC SWG, similarly to how the OGC GeoDataCube SWG operates (not intending to write a standard)

However, the ZEP-4 recommendation itself could potentially be a standard specification document, rather than being limited to a community standard (I don't see why).

I think the best way to go forward is to share the objective (spec of a ZEP-4 recommendation) with Scott and see if this is welcome in a SWG charter.

dblodgett-usgs commented 1 year ago

According to the GeoDataCube SWG charter, the SWG appears to be working on an implementation standard.

"GeoDataCube API standard (OGC API-Geodatacubes) and GDC metadata model." does not indicate that it is a community standard.

It would be wise to see what Scott thinks about having a SWG develop a community standard -- I do not understand that to be the intention of the Standards Working Group activity type in OGC.

dblodgett-usgs commented 1 year ago

In addition to what i put together in #13, the table here may be of interest.

https://docs.opengeospatial.org/pol/05-020r27/05-020r27.html#the-two-track-standards-process-characteristics

briannapagan commented 1 year ago

So just a reminder we are not aiming for a ZEP - we are aiming for a zarr convention, @rabernat exampled above. I am sure Scott will point us in the right direction for the correct OGC path and correct terminology.

christophenoel commented 1 year ago

So just a reminder we are not aiming for a ZEP - we are aiming for a zarr convention, @rabernat exampled above. I am sure Scott will point us in the right direction for the correct OGC path and correct terminology.

To be accurate, what @rabernat recommend is not to create a ZEP but a convention which defines a ZEP-4 recommandation which is meant to provide metadata conventions

christophenoel commented 1 year ago

However, the ZEP-4 recommendation itself could potentially be a standard specification document, rather than being limited to a community standard (I don't see why).

The conventions including the ZEP-4 Metadata recommendations would typically be an Implementation Standard. I don't see the purpose of going for a Community Standard in this way.

This "standard" conventions (in line with a ZEP-4 recommendation) would be something equivalent to:

dblodgett-usgs commented 1 year ago

We seem to be going in circles. Above, @christophenoel states:

We also currently discarded the plan of on an OGC Implementation Standard.

Then:

similarly to how the OGC GeoDataCube SWG operates (not intending to write a standard)

However, the ZEP-4 recommendation itself could potentially be a standard specification document, rather than being limited to a community standard (I don't see why).

But now we seem to be saying that we will structure this activity as a full OGC standard (Implementation Standard - not a community standard) such that the ZEP0004 convention is actually published as an OGC implementation standard?

My interpretation of https://github.com/zarr-developers/zeps/pull/28/files#diff-53e442aa938ca18ba1a94f845f264c5df0e4650f8f6abda856c3cd819f70abcaR115 was that a published OGC standard isn't quite what was intended in ZEP0004 -- @rabernat can you square that circle? Is the intention that a ZARR convention could actually be an OGC standard that's gone through the full OGC process to be a full standard? If that's the intent, I do think it could work, but many of the statements above are very confusing about the intent here.

The table here: https://docs.opengeospatial.org/pol/05-020r27/05-020r27.html#the-two-track-standards-process-characteristics is very useful to understand what a full OGC standard entails.

Also -- @rabernat can you update your comment here: https://github.com/zarr-developers/geozarr-spec/issues/11#issuecomment-1490228054 to use "Standards Working Group" and "OGC" if that's what you meant? That point threw me for a huge loop as I thought you were talking about https://opendatacharter.net/who-we-are/ or maybe https://www.opendatacube.org/odcc21 ??

christophenoel commented 1 year ago

Indeed, you have spotted my mistake @dblodgett-usgs. The correct sentence would have been "we discarded to write a specification and go for conventions".

My point is that those conventions (expressed in the scope of ZAP0004) suits the Implementation Standard proccess achieved by a SWG, which was the original intent of this group.