iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.oceaninfohub.org/
27 stars 16 forks source link

Pacific portals #3

Open pieterprovoost opened 4 years ago

pieterprovoost commented 4 years ago

The goal of this issue is to review the technology, standards, and readiness level of the Pacific data portals. The following SPREP and SPC portals have been identified as potential data sources for the OceanInfoHub in the PSIDS region:

skybristol commented 4 years ago

What I'm actually most interested in for thinking about the various hubs is something a little deeper into the specific content expressed from the various potential contributors. You are getting at that a little bit in your list with the "content types" piece, but I would be interested in just a slight bit more detail in terms of what types of data, publications, etc. the various hubs we're evaluating are providing.

Even having something like the topical indexing approach that the SPREP-PROE example uses is very useful, and evaluating hubs based on whether or not they do provide this type of value-added organization to their content would be useful.

skybristol commented 4 years ago

I would also further break up thinking about the content from repositories into two parts - primary and secondary. Primary contributions are going to be the things that the mission of whatever hub is most concerned with and responsible for. What is the core purpose and mission of the hub, and what does that determine in terms of what they are managing and serving up? Primary information may well be unique to the particular hub and the only or best place for getting that information in the whole network. Secondary information would be the stuff the hub has information about but that may be more appropriately linked to and sourced from either some other hub or "the commons." This categorization gives us an opportunity to look at a couple of dynamics we should be thinking about with regard to hub evaluation.

The 4th example, PCRAFI, appears to be more of a primary repository for certain types of data important to the mission of the project. They've adopted some reasonable technologies that have enabled the system to be fairly open and accessible to both humans and algorithms, but there are some significant content problems that put the overall utility outside the particular context in some jeopardy. The tech chosen and the syntactic standards that come along with them mean that there is great potential for this system to provide rich metadata and important points of interoperability such as exposure of underlying data models and even data summarization as a service. Unfortunately, it appears that most data served by this system have received minimal data stewardship treatment in terms of the details that would exploit those strengths of the platform.

Aggregators, like the first three in the list, are an interesting case in themselves. Aggregators are in a position to add interesting value to the network, but they may or may not be operating in such a way that the value is actually realized. These values also need to be a part of our thinking as we look to describe the ODIS-Architecture.

For any of these things, it's important for the value-added services to "declare themselves" and provide transparency into what they are doing, decisions they've made along the way, uncertainties they may have introduced, and other dynamics so that downstream users can be aware. There are useful standards, such as W3C-PROV, to help encode and share this type of information in more usable and robust ways.

From the ODIS-Arch perspective and implementation of OIH, we may want to work in a concept of "important, but low maturity level hubs." I know that could come across as somewhat arrogant, so we'll have to work on semantics of our messaging. We may decide that the information content that a given hub serves is conceptually important enough that the hub should be considered an active part of the network. We would register it and promote it in visualizations and advertisements of the network, and we would exercise "Global Hub" software on it to test its functionality and the reach of its content. We might slurp up its metadata into a Global Hub index and make it available in search results. However, the low maturity level means that it might likely have limited value to any other hub with more focused needs, and it would be a safe bet that things like records from low maturity hubs in our Global Hub index would not see much use and might even receive frowny faces because users still need to travel elsewhere and learn/understand a new context to make use of things they find.

For either of these types of cases, an examination of the recently published TRUST principles that came out of RDA work would be illustrative. Building on the idea of FAIR, the TRUST principles are aimed more at the practice of data stewardship and the operation of data repositories. Making good on those principles in whatever way works for the cultural and organizational context of our OIH hubs is going to result in a higher overall maturity level of the system and the information it is putting on the network. We might also look to the work of Ge Peng and others on data stewardship maturity and processes for measuring and improving in use in NOAA and elsewhere for principles and methods we want to bring into the architecture.