rd-alliance / FAIR-data-maturity-model-WG

https://www.rd-alliance.org/group/fair-data-maturity-model-wg/case-statement/fair-data-maturity-model-wg-case-statement
13 stars 3 forks source link

Indicators for FAIRness | Scoring #34

Open bahimc opened 5 years ago

bahimc commented 5 years ago

As presented to you during our last workshop, the editorial team has explored the concept of assessing the implementation level of the FAIR data principles.

This concept relies on the core criteria - indicators and their maturity levels - we have been developing since June. (meta)data satisfying or not satisfying the core criteria will allow to evaluate a digital object in order to answer the question “How the FAIRness of this data can be improved”.

Picture11

As many times mentioned, the FAIR principles are aspirational, FAIR is a journey. It is complex to measure, particularly through time, how FAIR a digital resource can be. But rather, the result of such an evaluation should be improvement areas. It is important to stress that, as put forth by the charter, it needs to be possible to compare the results of different evaluation approaches (questionnaires or automated tools).

The aim of this evaluation is NOT to pose a judgement but instead objectively score a resource to identify ameliorations. We will nonetheless avoid discussions about the scoring visualization as this is the responsibility of owners of methodologies.

With the aim of developing the best evaluation/scoring mechanism we encourage you to share any feedback below.

rwwh commented 5 years ago

This needs to be "future proof". It is important to see how any score should be seen over time. The currently identified indicators are surely not a definitive list: new ones may appear in the future. Priorities may change when FAIR supporting technology progresses. And even if the indicators may be fixed, a new community standard that is developed in the future could change the score for an existing data resource.

I think there are two options, both with their disadvantages:

rwwh commented 5 years ago

The disadvantage of any "step/star/level" system like this is that once a level has just been obtained, there is very little incentive to make any more effort towards the next level, unless there is any prospect of making the next level.

Another potential weakness is that compliance with "recommended" indicators is not counting as long as there is even one mandatory indicator missing. This conflicts a bit with my usual story that people have to evaluate for each FAIR principle whether the efforts weigh up to the benefits. I would find it really a pity if a relatively simple FAIRification effort is not undertaken because "it won't give me a better score anyway".

A weird alternative proposal: give the score as a 3-number tuple, indicating the percentage of mandatory, recommended, and optional indicators met, maximized to 99.99.99.

rwwh commented 5 years ago

The proposed system does not separately identify F, A, I, and R. compliance.

makxdekkers commented 5 years ago

@rwwh

It is important to see how any score should be seen over time. The currently identified indicators are surely not a definitive list: new ones may appear in the future. Priorities may change when FAIR supporting technology progresses. And even if the indicators may be fixed, a new community standard that is developed in the future could change the score for an existing data resource.

I think it is indeed important to consider that indicators may change over time. A 'score' needs to be related to the set of indicators at the time of scoring, also, as you note, because there could be changes in the environment (standards, technologies) that need to be taken into account.

makxdekkers commented 5 years ago

@rwwh

The disadvantage of any "step/star/level" system like this is that once a level has just been obtained, there is very little incentive to make any more effort towards the next level, unless there is any prospect of making the next level.

It seems to me that this is not a characteristic of the scoring, but more a policy issue. If people do not want to take extra steps, they might not be interested in FAIR in any case. I would think that the scoring helps people to understand where they can improve -- whether or not they want to improve is another matter. It would be important for people to understand that FAIRness is not a goal, so it's not about getting a higher score, but a means to an end, namely enabling reuse.

makxdekkers commented 5 years ago

@rwwh

A weird alternative proposal: give the score as a 3-number tuple, indicating the percentage of mandatory, recommended, and optional indicators met, maximized to 99.99.99.

Yes, that could be a good alternative.

makxdekkers commented 5 years ago

The proposed system does not separately identify F, A, I, and R. compliance.

Are you suggesting that we create scores like

F: 50.75.20 A: 100.33.75 I: 75.20.33 R: 80.100.25

Does that take away some of your concerns about the proposed scoring above being too crude?

rwwh commented 5 years ago

It passed my brain, but 12 scores may be a bit much. It is also no longer a visual summary, but more like a table that needs intention to read.

makxdekkers commented 5 years ago

@rwwh The 12 scores were based on your earlier proposal for three values and then separately for the four areas F, A, I and R. How would you suggest to do it differently?

rwwh commented 5 years ago

It is a hard problem.... One way to do it is to "fold" in the two directions: give a single triple, and try to give a F,A,I,R profile separately. That still makes 7 levels.

makxdekkers commented 5 years ago

@rwwh Are your suggesting to have

  1. an overall FAIRness score in three numbers, e.g. 100, 40, 75 (mandatory, recommended, optional) plus
  2. an average score for the areas, e.g. if for F you have 100, 60, 40, the score for F would be either (a) 67, the average of the scores, or (b) level 3 as in the table above?

It has the advantage that there are less numbers but it also makes it less clear where improvements could be made.

keithjeffery commented 5 years ago

@makxdekkers @rwwh Apologies for not commenting earlier. I took time to think about the priorities because it is easy for such indicator values to have unintended consequences. Generally I agree with Rob and like the 3 percentages formula. While separate scores for F,A,I,R are an advantage the complexity is increased. I think we agreed earlier that there is a sort of progression: R is not possible without I, I not possible wthout A etc. In this case could we find a composite final indicator where the 'contribution' of F, A, I to R is somehow factored in? For example, Mandatory R would involve mandatory F,A,I. Recommended R would involve mandatory F, Mandatory A, recommended I and so on. This would (a) simplify things in presentation (but perhaps not in evaluation/scoring); (b) provide encouragement to progress Just my 2 cents worth Keith

makxdekkers commented 5 years ago

@keithjeffery Can you explain what you mean by "Mandatory R"; we're trying to put mandatory, recommended and optional on the indicators, not on the FAIR areas.

rwwh commented 5 years ago

@keithjeffery Although I certainly agree that there is a progression, the principles and thereby the indicators are quite orthogonal, and even if F indicators have not been met, an additional R indicator can be important. @makxdekkers yes, 2(b) would have my preference over (a).

keithjeffery commented 5 years ago

@makxdekkers apologies for my lazy 'shorthand'. I meant to indicate that sufficient indicators (sufficient to be defined) of the kind Mandatory, Recommended were considered achieved within each of the FAIR groups of principles. @rwwh Can you give an example of an acceptable level of rich metadata for a R indicator that is less (i.e. fewer attributes, less formal syntax, less defined semantics) than that required for a F indicator? From my (admittedly insufficient) experience metadata acceptable for F is a subset of that for R (or A, I).

rwwh commented 5 years ago

@keithjeffery I consider the metadata for Findability to be (mostly?) disjoint from the metadata for Reusability. Findability requires a good classification of what exactly IS in the data. This is no longer necessary once someone has decided to reuse the data; at that point they only need the R-metadata (i.e. how it was obtained, what the license conditions are, ...). See also: http://www.hooft.net/en/people/rob/events/193-tell-me-what-it-is-not-how-you-use-it

keithjeffery commented 5 years ago

@rwwh I see your point but my experience in environmental science is different. For example common metadata attributes used for F are spatial and temporal coordinates (e.g. restriction of area and date range for earthquakes or volcanic events). These are required again in R (for map overlays for example) together with rights/permissions and much more(which presumably also are needed first in A). As users become more sophisticated in their use of Discovery (F) they use quite detailed metadata to 'cut down' to the real digital assets they need (contexualisation, A) anticipating what they will be doing (e.g. producing maps, executing simulations with complex parameters) under I (to massage the digital assets into a form where they can be used together) and R (where they are used together). It would be good to consider this in several different domains to see if there is a common pattern (progression) of required 'richness' of metadata through F,A,I,R @makxdekkers apologies that this is slightly off topic but I believe it is iteresting!

makxdekkers commented 5 years ago

@keithjeffery Indeed, interesting and not altogether off-topic. Please note that in the indicators for F2 and R1, we do make it explicit that the metadata for F2 is about discovery and the metadata for R1 is about reuse. Now, what that means in practice is open for discussion, and I hope that the joint metadata meeting in Helsinki can shed some light on this.

sjskhalsa commented 5 years ago

@keithjeffery - from the perspective of a researcher there is a large overlap of metadata supporting F and R. I may start a search based on topic/measurable, space and time, but would then refine based on metadata informing fitness for use (e.g. percent cloud cover if I was looking for imagery) and, of course, accessibility (what is the cost?). I suspect many use cases would need to be analyzed to determine whether a common pattern of progression through levels of metadata richness could be discerned.

rwwh commented 5 years ago

@keithjeffery Is this difference between geo and life sciences caused by the fact that volcanology data can only ever be used for volcanology? As an "outsider" I would say seismology data would be useful in more than one subfield of geosciences, and then I could imagine that something like the frequency filter characteristics could be a piece of "findability" metadata.

With another risk of going off-topic: reusability keeps surprising me. I have been convinced that a researcher is the worst person in the world to judge the reusability of their own data, because they are biased towards their own view. To try and convey some of my surprise: I heard once about researchers re-using interviews that were recorded for the study of dying languages for the study of room acoustics .... this would be helped by carefully crafted Findability metadata absolutely irrelevant for the reuse in the original science field ..... and is certainly not helped by a careful transcription of the conversation.

I am going to sleep another night on this. Maybe you are right that Findability metadata is always a subset of Reusability metadata. I would just place that subset exclusively under F, thereby creating a clear orthogonality between the letters.

keithjeffery commented 5 years ago

@ sjskhalsa I think we are agreeing! However, I suspect we also agree the analysis of multiple use cases would be costly.

@rwwh Actually volcanology data is highly reusable: inorganic chemists, gas chemists and physicists, atmospheric physicists, meteorologists, through to geothermal energy industry, civilian authorities engaged with anthropogenic hazard and, of course, air traffic control (remember Iceland a few years back and, in the last few days, Etna).

The key thing about your re-use example is that the richer the metadata in F,A, I the easier it is for the re-using researcher to assess if re-use is possible. I have a similar example from my PhD days (sixties); I developed software fitting a sinusoidal wave to geoscience data but the software was used much more by some guys working in optics. The metadata associated with my software was apparently sufficent for them to conclude it could be re-used for their purpose.

I continue to believe that for activity under R it is necessary to have available (or have used in a previous step in the workflow) rich metadata covering not only F but also A (e.g. rights, licences) and I (convertors available, appropriate formats).

makxdekkers commented 5 years ago

@rwwh

You proposed:

One way to do it is to "fold" in the two directions: give a single triple, and try to give a F,A,I,R profile separately.

Based on that I suggested:

  1. an overall FAIRness score in three numbers, e.g. 100, 40, 75 (mandatory, recommended, optional)

I think you and @keithjeffery agreed with that approach.

Wouldn't this create the risk that someone looks at the overall score and decides, if the score for mandatory is not 100, "oh, this resource is not FAIR"? In earlier discussions, people thought it would not be a good idea to have an overall score because it is too crude in comparison to scores per principle or per area.

rwwh commented 5 years ago

@makxdekkers I thought to suggest a combination of your approach (1) and your approach 2(b) for at total of 7 "numbers". We will not be able to completely eradicate people saying "this resource is FAIR" or "that resource is not FAIR", but I've said often enough that I don't want anyone to use a binary assignment. ;-)

makxdekkers commented 5 years ago

@rwwh Yes, I understood that you proposed the combination of 1 and 2b. I was just noting that the triple under 1 could be misinterpreted as "the" FAIR evaluation result if people were searching for an easy conclusion, and then would not look at the per-area assessments. Maybe that can be avoided if evaluators never quote the triple without the per-area results.

markwilkinson commented 4 years ago

In my opinion, an overall FAIRness score is not a useful measure, for many of the reasons mentioned above, but in addition...

In the paper on the FAIR Evaluator, we suggested to approach FAIRness testing from the perspective of contracts between a data provider, and a data consumer - promises that both humans and machine users can rely on as they attempt increasingly complex data transactions with a provider. From this perspective, a "FAIR score" is a totally meaningless artefact, since it tells me nothing about the behaviors of that provider that I can code my agent to expect.

makxdekkers commented 4 years ago

@markwilkinson Yes, in the perspective you depict, I agree that a score doesn't tell you much. However, there are potential users of the indicators, for example funding agencies, who may want to verify that the data produced by projects they fund complies with a certain level of FAIRness. Such a "FAIRness score" should be accompanied by the more detailed observations on the indicators, so an evaluator can see which indicators have been satisfied and which have not.

rduerr commented 4 years ago

@makxdekkers Uff...! The example you just gave leads me to think about the current practices of publication metrics which are indeed used in just such ways, much to science's detriment in my opinion. Do you really want to create such an environment?

markwilkinson commented 4 years ago

Indeed.... in fact, I think we should be DISCOURAGING the idea of "FAIR Scores"! (as we do in the Evaluator manuscript)

rwwh commented 4 years ago

The best way to use maturity indicators for me is as a checklist in order to become more mature.... to perform the cost/benefit analysis I mentioned in the call yesterday: do what makes sense, stop before diminishing returns hit. The result is strongly dependent on the kind of data and the environment.

rduerr commented 4 years ago

@rwwh I tend to agree with that; but suspect that the inventors of the publication metrics felt exactly the same way. You would have to come up with mechanisms to prevent similar occurrences (and I am not convinced there are any).

makxdekkers commented 4 years ago

@markwilkinson @rwwh @rduerr We'll abandon the idea of an overall score. There is consensus in the working group that it is more important to show how well (meta)data meets the requirements in the principles in order to help people improve FAIRness.

ghost commented 4 years ago

Hope you forgive me to revive this thread. Just appeared to my mind whether it would be helpful to visualize "improvement opportunities"?

I.e. "a lot of indicator metrics not fulfilled yet" -> "big improvement opportunity" -> "big arrow" (just for example an "arrow", could be anything)?

What do you think? Best, Robert

bahimc commented 4 years ago

@robertgiessmann thanks for your comment. Regarding the improvement opportunities, radar charts have been proposed to visualise the efforts needed to increase FAIRness. The evaluation method prototype can be accessed here.

ghost commented 4 years ago

Hello there! (sorry, @RDA-FAIR, not sure whom I'm speaking to behind this account), I think that a phrasing like "efforts needed to increase FAIRness" might be incepting that "guilt feeling" in people; I hoped that "improvement opportunity" (similar to "low hanging fruit" or "quick wins") might sound more attractive? FAIRification should be considered something "nice", something "where you gain something from", no? Of course all that words are obvious Newspeak, solely, but we might give it a try? Best, Robert

cbahim commented 4 years ago

@robertgiessmann, this is Christophe Bahim, member of the editorial team supporting the FAIR data maturity model. Indeed, as spelled out in the proposed recommendation, the evaluation method was designed not as a value judgment but rather as guidance, where all communities will remain involved. The wording used is reflective of that. If you think of concrete improvements, please feel free to share them. Otherwise, improving the evaluation method is in the agenda for the FAIR data maturity model maintenance group. Don't hesitate to bring this up during one of our next webinars.