Knowledge representation

makxdekkers commented 5 years ago

What should be expected from knowledge representation systems in terms of syntax and semantics? How can knowledge representation systems (code lists, controlled vocabularies, ontologies) help or hinder FAIRness?

keithjeffery commented 5 years ago

Formal declared semantics are a great assistance to FAIRness since their use improves relevance and recall (to use old fasioned Informaiton retrieval concepts). Essential for each of F,A,I,R. Moreover, while simple vocabularies can be adequate for some purposes, formal ontological structures (not necessarily in an ontology of the W3C/RDF kind) can improve greatly F (use of related terms including mutlilinguality), I (with crosswalks between terminology structures) and R.

SusannaSansone commented 5 years ago

This too links nicely with the content of the RDA FAIRsharing WG registry, which is now one of the formally approved RDA outputs.

As detailed at #29, domain/discipline-specific community standards already define their own terminologies (from CVs to ontologies that provide definitions and unambiguous identification for concepts and object; see here), especially to formalize knowledge in datasets.

makxdekkers commented 5 years ago

@SusannaSansone As far as I can see, there is no explicit mention in any of the FAIR principles to test for the use of terminologies (CV/ontologies) that are commonly used in a community. This seems to be implicit in both I1 ("... shared, and broadly applicable language ...") and R1.3 ("... domain-relevant community standards ...". Should we consider adding an indicator for this requirement to use terminologies that are common for a community?

SusannaSansone commented 5 years ago

@makxdekkers as I detail in at #29 many communities consider common terminologies part of the community standards.

makxdekkers commented 5 years ago

Thank you @SusannaSansone. It seems to me, then, there is no need for a separate indicator for this.

SusannaSansone commented 5 years ago

@makxdekkers indeed but it needs clarification that for community standards we mean terminologies, models, formats etc...Again, we may need a glossary because (as commented in other parts) we use different labels and definitions.

makxdekkers commented 5 years ago

@SusannaSansone Good point. Would you be able to propose a list of terms for which we need to agree definitions?

rwwh commented 5 years ago

Lacking any formal training in computer science, I have always tried to explain formal language for knowledge representation at a slightly less formal level as:

any format used for representing data that does not leave any ambiguity as to the meaning of the data.

This could e.g. be full-fledged RDF, but it may also be a standardized domain-specific data format that has all (meta)data fields very well defined.

This again may be dependent on the context: when health data and climate data are combined in an interdisciplinary study, the field "temperature" which may be unambiguous in either field may suddenly need more explanation (body temperature vs ambient temperature).

SusannaSansone commented 5 years ago

@SusannaSansone Good point. Would you be able to propose a list of terms for which we need to agree definitions?

@makxdekkers unfortunately there is no widely agreed glossary. I can only report on the one used by FAIRsharing, which classify community standards as:

minimal reporting requirements (checklists or templates that outline the necessary and sufficient information vital for contextualizing and understanding a digital object); examples here.
terminologies (from CVs to ontologies that provide definitions and unambiguous identification for concepts and object); examples here
models/formats (define the structure and relationship of information for a conceptual model or schema, and include transmission formats to facilitate the exchange of data between different systems); examples here

Minimal reporting requirements are usually textual doc or lists. Terminologies and models/formats are machine readable and expressed in one or more metaformat (XML, DRF, TAB etc).

makxdekkers commented 4 years ago

@rwwh @SusannaSansone

I note that both of you are co-authors of the recent article Annika Jacobsen at al., FAIR Principles: Interpretations and Implementation Considerations.

In the Guidelines document, I added this comment.

_I'd like to note that in the latest article https://doi.org/10.1162/dint_r_00024 a clarification is given that basically makes 'knowledge representation' just about the language that is used, and it gives RDF as example. It says nothing about the 'payload' of RDF, i.e. the classes and properties that are used within RDF. Also, the idea of 'reporting guidelines' seems to be more related to 'minimal information models' to which the article refers under principles F2 and R1.3. My worry is that if we define knowledge representation in the indicators differently than the FAIR authors, we're redefining the principles, which is not in our charter._

As you are members of the group of FAIR authors, I would very much appreciate your views.

rwwh commented 4 years ago

In the call yesterday @markwilkinson identified @micheldumontier as the best person to answer this.

My take on "formal language for knowledge representation" has been to tell people that this is meant to avoid all possible ambiguity. So, like said for patents, it is good if a format does not leave any room for misinterpretation for "someone skilled in the art". Hereby it should be noted that "skilled in the art" becomes harder to define for more inter-disciplinary interoperability.

Mark referred to their discussions about requiring the knowledge representation to have at least a https://en.wikipedia.org/wiki/Backus–Naur_form , but that not being sufficient. I can't comment on that since I don't have formal education in computer science.

markwilkinson commented 4 years ago

Right, so BNF ensures that a machine can unambiguously parse a message - it's a mechanism for precisely defining a syntax. It does not, however, speak to meaning. For that, we have ontologies.

So... IMO, the "formal language for knowledge representation" must be a formal syntax, combined with a shared semantic. RDF+Ontologies is one widely-used option, but there are others.

micheldumontier commented 4 years ago

agree with mark: a formal knowledge representation language articulates a machine-readable syntax and mathematical-based semantics. therefore, the information contained within can be automatically parsed by a machine, and that the content itself is amenable to automated reasoning in which new implications can be derived. BNF is just one way to express the syntax of the language, but there are others.

keithjeffery commented 4 years ago

All – I have been observing with interest. Many of you will have heard me say many times at RDA “formal syntax and declared semantics” I am happy with BNF; for me the key thing is that the syntax should be in a notation suitable for logic processing (so one can reason about the semantics carried over the syntax) Best wishes Keith

Keith G Jeffery Consultants Prof Keith G Jeffery E: keith.jeffery@keithgjefferyconsultants.co.ukmailto:keith.jeffery@keithgjefferyconsultants.co.uk T: +44 7768 446088 S: keithgjeffery

The contents of this email are sent in confidence for the use of the intended recipient only. If you are not one of the intended recipients do not take action on it or show it to anyone else, but return this email to the sender and delete your copy of it.

From: Michel Dumontier notifications@github.com Sent: 14 February 2020 09:46 To: RDA-FAIR/FAIR-data-maturity-model-WG FAIR-data-maturity-model-WG@noreply.github.com Cc: Keith Jeffery Keith.Jeffery@keithgjefferyconsultants.co.uk; Comment comment@noreply.github.com Subject: Re: [RDA-FAIR/FAIR-data-maturity-model-WG] Knowledge representation (#14)

agree with mark: a formal knowledge representation language articulates a machine-readable syntax and mathematical-based semantics. therefore, the information contained within can be automatically parsed by a machine, and that the content itself is amenable to automated reasoning in which new implications can be derived. BNF is just one way to express the syntax of the language, but there are others.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/14?email_source=notifications&email_token=ADALU52F45XL5LEJAGBKXKLRCZR4ZA5CNFSM4H2ZIX22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELYGZ7I#issuecomment-586181885, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADALU52QPENBKOWYSIV7CITRCZR4ZANCNFSM4H2ZIX2Q.

rwwh commented 4 years ago

For me as CS Noob: how about a properly structured CSV? HDF? or specifics like a TIFF file or even BAM? Do those files satisfy this rule?

keithjeffery commented 4 years ago

@rwwh : unfortunately CSV (or any other 'file' format) does not usually conform to BNF (of course you could put a BNF statement in a cell of a spreadsheet). The key point is that the syntax should be parsable by a computer. BNF is 'behind' all modern programming languages and reltes directly to boolean logic (hence the ability to induce and deduce (probably do not need to abduce)). In the FAIR context the important thing is that the knowledge representation has formal syntax upon wich semantics can be 'loaded'. Thus

rd-alliance / FAIR-data-maturity-model-WG

Knowledge representation #14

Keith G Jeffery Consultants Prof Keith G Jeffery E: keith.jeffery@keithgjefferyconsultants.co.ukmailto:keith.jeffery@keithgjefferyconsultants.co.uk T: +44 7768 446088 S: keithgjeffery

The contents of this email are sent in confidence for the use of the intended recipient only. If you are not one of the intended recipients do not take action on it or show it to anyone else, but return this email to the sender and delete your copy of it.