w3c / sdw

Repository for the Spatial Data on the Web Working Group
https://www.w3.org/2020/sdw/
150 stars 81 forks source link

[SDW Best Practices Update]: Add FAIR principles to the document #1290

Closed situx closed 2 years ago

situx commented 2 years ago

Add FAIR principles to the document

PeterParslow commented 2 years ago

Gathering some thoughts:

1. FAIR gets a few mentions in the UNGGIM Standards Guide. Here's the trial online version (likely to move to a different location at some point): http://standards.unggim.ogc.org/unggim_guide.html. For example:

"The Guide also underscores the importance of standards in facilitating the application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles - promoting improved policymaking, decision making and government effectiveness in addressing key social, economic, and environmental topics, including attainment of Sustainable Development Goals."

"Developed in 2016, the ' FAIR Guiding Principles for data management and stewardship' can be used to help with development of these capabilities. These guidelines intend to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets, and emphasize machine-actionability (the capacity of computational systems to find and interrogate data with none or minimal human intervention) to support humans in dealing with increased volume, complexity, and creation speed of data. The FAIR Principles provide a very comprehensive framework for applying standards and dealing with all aspects of the data lifecycle, including the ability to collect, organize, describe, and manage geospatial information."

"There are a wide range of FAIR data training resources and courses offered on the internet and by various organizations worldwide. One such example is provided by the Australian Research Data Commons"

2. Natural Resources Canada advocate extending that concentration on "machine-actionability" to remember humans: for web publication, being accessible to humans is important. Human accessibility (readability) helps "findability" because search engine crawlers are designed to prioritise human-readable pages. But also, "web accessibility" emphasises users with a range of abilities being able to access the data (information) using a variety of assistive technologies.

I'm sure I heard about that at OGC December 2021, but I can't find anything just now!

3. UK Geospatial Commission adds Q - it's not that useful to find & access data unless the quality is appropriate. (This is somewhat reminiscent of a criticism of "five star data" that could be "five star" but of very low quality). https://geospatialcommission.blog.gov.uk/2021/06/25/byte-ing-back-better-introducing-a-q-fair-approach-to-geospatial-data-improvement/;

4. Earth Sciences Information Partnership "FAIR Data Quality Information" https://osf.io/xsu4p/ also looks to add information about data quality to that "required by" the FAIR principles - whilst arguing that (therefore!) the 'dataset quality information' about a dataset (& its supporting material) has also be FAIR. No point in having a quality report & then not making it accessible to (potential) users

"When data can be discovered based on information about certain quality attributes, the findability of the data is improved for users who need data that contain such attributes. Accessibility and usability of data is improved by describing issues and conditions that could affect the use of the data. Describing quality information in standardized formats, schemas, and terminology with controlled and even harmonized vocabularies, improves the interoperability of the data. The reusability of data is facilitated by describing limitations on use as well as appropriate and inappropriate uses and usage of the data" (Ivana Ivanova et al)

PeterParslow commented 2 years ago

SDWBP FAIR thoughts.docx

Here are some ideas for change in a few sections. I have an ASCII Doc version - thinking that would fit GitHub better - but I can't upload it to a comment. And there's no way it's mature enough to generate a stack of pull requests yet! Just thoughts.

prushforth commented 2 years ago

2. Natural Resources Canada advocate extending that concentration on "machine-actionability" to remember humans: for web publication, being accessible to humans is important. Human accessibility (readability) helps "findability" because search engine crawlers are designed to prioritise human-readable pages. But also, "web accessibility" emphasises users with a range of abilities being able to access the data (information) using a variety of assistive technologies.

I'm sure I heard about that at OGC December 2021, but I can't find anything just now!

@PeterParslow the NRCan FAIR presentation is in the closing plenary meeting folder, here.

lvdbrink commented 2 years ago

Very useful writeup @PeterParslow !

I think you would find some matches for the R - Reusable part in DWBP.

Also, data quality probably deserves some more attention than it's getting in SDWBP at the moment. It seems to be gaining importance....

PeterParslow commented 2 years ago

@BillSwirrl I realise that actually you & I were to work on this together; I hope you don't mind that by putting it here I've opened my thoughts to wider group discussion before you had your chance to input.

situx commented 2 years ago

Also, data quality probably deserves some more attention than it's getting in SDWBP at the moment. It seems to be gaining importance....

This is a very good point and this point can be expanded even more if we consider data quality aspects in linked data environments. But I am not sure if this is out of scope

PeterParslow commented 2 years ago

After a bit of thought on my part, the ASCIIdoc version of the file I attached (as Word above) is now here: https://github.com/w3c/sdw/commit/1cf1846332a06c876033caf42808c55ce82b14c2 in the "proposals" folder

PeterParslow commented 2 years ago

I have made changes https://github.com/w3c/sdw/commit/7b12b7a5dd3d604288f9a90c01eb971d2596b048 to address Linda's comment https://github.com/w3c/sdw/issues/1290#issuecomment-996569364

So please use that & ignore the Word file from now on.

ogcscotts commented 2 years ago

I would argue that "Q" (Quality) as an extension may not be needed. If we equate the quality as a measure of the fitness-for-purpose of data, then that fitness should be described to make data ("F") findable for a specific purpose and ("R") reusable or suitable for other purposes.

PeterParslow commented 2 years ago

The UK's Geospatial Commission's case is that data needs to be "Q" (fit for some purpose?) before there's much point in making it findable or reusable.

https://geospatialcommission.blog.gov.uk/2021/06/25/byte-ing-back-better-introducing-a-q-fair-approach-to-geospatial-data-improvement/

I am aware of counter arguments - one sector of UK's open data community pushed for people to put whatever data they had 'out there' with the hope/expectation that users would provide quality improvements. However, I do remember a similar challenge being made to TBL's "five stars" of (linked) open data - that they said nothing about data quality.

wilkesg commented 2 years ago

Hi All. Natural Resources Canada within the GeoConnections Program has been advocating to further develop the F (findability) and A (Accessibility) aspects of FAIR with the use of the +, i.e., FAIR+ . For findability, we are advancing work towards automating the discovery of spatial web services via crawlers. We crawl the .ca domain and list the results of available spatial services here: https://www.nrcan.gc.ca/science-and-data/science-and-research/geomatics/canadas-spatial-data-infrastructure/geospatial-web-services/19359 ... The same is being done for Arctic Spatial Data Infrastructure in their metadata catalog available here: https://arctic-sdi.org/arctic-sdi-metadata-catalogue/ . We are also pursuing improving accessibility through web mapping and Maps for HTML https://www.w3.org/community/maps4html/ Work is focused on improving accessibility to those with disabilities via screen map readers and map layer readers. Interoperable is a pillar of our work as well and is well established with our use, promotion, and support for the development of standards.

jvanulde commented 2 years ago

The UK's Geospatial Commission's case is that data needs to be "Q" (fit for some purpose?) before there's much point in making it findable or reusable.

https://geospatialcommission.blog.gov.uk/2021/06/25/byte-ing-back-better-introducing-a-q-fair-approach-to-geospatial-data-improvement/

I am aware of counter arguments - one sector of UK's open data community pushed for people to put whatever data they had 'out there' with the hope/expectation that users would provide quality improvements. However, I do remember a similar challenge being made to TBL's "five stars" of (linked) open data - that they said nothing about data quality.

Interesting. Thanks for sharing @PeterParslow. Data quality is a concern but could/should it be handled separately from FAIR? I see FAIR being overloaded a lot these days, and I suspect it's because there isn't a nicely packaged set of principles that cover all concerns, at least not yet. Quality and fitness for purpose can be separate concerns, can they not? As well, fitness for purpose pre-supposes that one knows the universe of applicability for a specific set of data - and I don't believe this is always the case. I think it's often enough to describe the limitations of the data, and let the consumer decide how fit it is for their purpose. Maybe I'm getting tripped up on semantics.

KoalaGeo commented 2 years ago

Hi All. Natural Resources Canada within the GeoConnections Program has been advocating to further develop the F (findability) and A (Accessibility) aspects of FAIR with the use of the +, i.e., FAIR+ . For findability, we are advancing work towards automating the discovery of spatial web services via crawlers. We crawl the .ca domain and list the results of available spatial services here: https://www.nrcan.gc.ca/science-and-data/science-and-research/geomatics/canadas-spatial-data-infrastructure/geospatial-web-services/19359 ... The same is being done for Arctic Spatial Data Infrastructure in their metadata catalog available here: https://arctic-sdi.org/arctic-sdi-metadata-catalogue/ . We are also pursuing improving accessibility through web mapping and Maps for HTML https://www.w3.org/community/maps4html/ Work is focused on improving accessibility to those with disabilities via screen map readers and map layer readers. Interoperable is a pillar of our work as well and is well established with our use, promotion, and support for the development of standards.

@wilkesg that's really interesting - out of interest is your crawler open source? The OGCAPI standards were meant to address some of the findability issues, if you search "Vineyards Germany" in Google Dataset search it picks up the OGCAPI-Features service with relevant data as the top result - https://datasetsearch.research.google.com/search?query=vineyards%20germany

PeterParslow commented 2 years ago

"fitness for purpose pre-supposes that one knows the universe of applicability for a specific set of data" - very true: I always think "fitness for what purpose"?

That's actually why some work we did a year ago (for Geospatial Commission) has resulted in a request to add two elements to the UK discovery metadata profile: one for "original purpose" and the other for "other purposes that people have used this data for". The second one may be harder for publishers to populate, and out (UK) portals don't provide a lot of support for gathering that info - even where people find the data through the portal.

Examples are probably well known, e.g. bathymetry for 'big ship navigation' is not the best for 'submarine navigation' or sea bed assessments for mining potential.

So far, we (UK) are still trying to encourage publishers to at least be open about the quantitative quality of their datasets.

For SDWBP, it may make most sense to still discuss quality independently from FAIR.

wilkesg commented 2 years ago

Hi All. Natural Resources Canada within the GeoConnections Program has been advocating to further develop the F (findability) and A (Accessibility) aspects of FAIR with the use of the +, i.e., FAIR+ . For findability, we are advancing work towards automating the discovery of spatial web services via crawlers. We crawl the .ca domain and list the results of available spatial services here: https://www.nrcan.gc.ca/science-and-data/science-and-research/geomatics/canadas-spatial-data-infrastructure/geospatial-web-services/19359 ... The same is being done for Arctic Spatial Data Infrastructure in their metadata catalog available here: https://arctic-sdi.org/arctic-sdi-metadata-catalogue/ . We are also pursuing improving accessibility through web mapping and Maps for HTML https://www.w3.org/community/maps4html/ Work is focused on improving accessibility to those with disabilities via screen map readers and map layer readers. Interoperable is a pillar of our work as well and is well established with our use, promotion, and support for the development of standards.

@wilkesg that's really interesting - out of interest is your crawler open source? The OGCAPI standards were meant to address some of the findability issues, if you search "Vineyards Germany" in Google Dataset search it picks up the OGCAPI-Features service with relevant data as the top result - https://datasetsearch.research.google.com/search?query=vineyards%20germany

@KoalaGeo Hi we use Spatineo's services which I believe is based on open source code. We can reach out to them together for more info if interested getting into the coding details.

PeterParslow commented 2 years ago

@prushforth @wilkesg Are there any publicly accessible statements about FAIR+ that we could reference?

prushforth commented 2 years ago

I believe we only published our article to the NRCan intranet. That said, I'll verify if we can contribute something here.

PeterParslow commented 2 years ago

@ogcscotts Scott, could you look at the text I've suggested around line 4800 of https://github.com/w3c/sdw/pull/1333/files, and see if it addresses your comment? If so, we (SDW WG) could close this issue & handle any further discussion against that #1333

KoalaGeo commented 2 years ago

@PeterParslow would something like this be appropriate - https://ec.europa.eu/info/sites/default/files/turning_fair_into_reality_0.pdf

KoalaGeo commented 2 years ago

Might be something else in all the inspire guidance

rob-metalinkage commented 2 years ago

Its hard to imagine Reusability without semantically detailed description that includes quality. This includes description of structure, semantics of objects and attributes (declared and observed) and the underlying semantics inherent in the relationship of observation or fiat processes to the capture of data - by which any quality measures derive meaning and fitness for use can be determined.

Putting accountability for fitness-for-purpose assessment (without knowing the end use) on the data publisher, rather than on providing adequate description, can only result in less data, less well described and less value realised from the data.

A parallel in disability access to facilities is that disability advocates are increasingly demanding information on actual dimensions rather than a judgment applied without regard to the varying needs and capabilities of a wide spectrum of users. FAIR should not assume knowledge of end use,

PeterParslow commented 2 years ago

@KoalaGeo : that looks like a useful general document to cite; could you propose it (as a pull request) at https://github.com/w3c/sdw/pull/1333?

@rob-metalinkage : I'm not sure I see the connection between giving a good description of structure, semantics, etc, and "quality" (except I guess in a very general sense - the data isn't much use without the semantics being explained).

"quality" here is surely a mix of quantitative (stuff like 'how much of the data do we believe to be correct / accurate / ....' & why do we believe that) and sometimes just qualitative. It's the latter where 'purpose' becomes relevant: 'good enough for what'. If the publisher doesn't know what their purpose is in having/publishing the data, then it would be pretty hard for them to say anything about it.

rob-metalinkage commented 2 years ago

@PeterParslow my own experience of data quality issues around reuse has been always that the statistical significance and methodology of the sampling (initial or in post-processing) tends to overshadow all other concerns in terms of how the data can be used safely (or used to lie or obscure deliberately). For example siting air pollution monitors on buildings away from where people actually have to breathe.. I class these as semantic aspects of data, but not related to structure at all.

KoalaGeo commented 2 years ago

@PeterParslow PR to your patch https://github.com/w3c/sdw/pull/1336

ogcscotts commented 2 years ago

@PeterParslow The revised text in PR #1333 looks good and bridges the FAIR concept with the quality/suitability extensions.

PeterParslow commented 1 year ago

"I’m not so sure of the match(es) here"

This phrase has made it through to the Editor's Draft. I put it there but forgot to specifically ask for a review of the last few sentences of how I matched "R" to Data on the Web Best Practices.

@lvdbrink @situx