ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
261 stars 74 forks source link

Help with ROBOT Template format #1034

Open JosephAWhite opened 2 years ago

JosephAWhite commented 2 years ago

Hello,

Could somebody please explain the following to me? The format for these template files is clear as mud.

I merely want to create an owl file from the following template:

CURIE Label object type Annotation ID LABEL SC % A rdfs:label obo:0001462 testclass
obo:0001463 ATRIAL FIBRILLATION testclass A FIB obo:0001464 ATRIAL FLUTTER testclass A FLUTTER obo:0001465 ALTERED MENTAL STATUS testclass A M S obo:0001466 ARTERIOVENOUS testclass A V

(There are 4 columns: ID, LABEL, SC, and rdfs:label. The intent is to create an owl file with a class 'testclass', and hundreds of subclasses.

This fails:

java -jar C:\Users\N114120\Documents\owl-parsers\robot\robot.jar template --template import_file.tsv --output out.owl MANCHESTER PARSE ERROR the expression (testclass) at row 4, column 3 in table "import_file.tsv" cannot be parsed: encountered 'testclass', but expected one of: Class name Object property name Data property name inverse not ( ... )

If I remove the rdfs:label column, the conversion works: But then I don't have the abbreviations I want.

java -jar C:\Users\N114120\Documents\owl-parsers\robot\robot.jar template --template import_file.tsv --output out.owl

Why? I need the rdfs:label column; it provides an alternate name for the class being created.
This behavior makes absolutely no sense to me. The ROBOT Template documentation does not make it clear, and does not cover all instances of tasks that may need to be done.

Can anyone help with this, please. Joe White

jamesaoverton commented 2 years ago

I believe this is the same label problem as #1033. The easiest solution should be to replace A rdfs:label with your preferred predicate for annotating abbreviations, such as A rdfs:comment or A oboInOwl:hasExactSynonym or A IAO:0000118 ('alternative term').

jamesaoverton commented 2 years ago

If you are unhappy with ROBOT templates, you may wish to consider other software such as DOS-DP or tabOTTR.

In order to interpret labels in Manchester expressions, ROBOT needs a mapping from label to ID. We follow the OBO convention, which is to use rdfs:label predicate for the primary label, prefer exactly one primary label per term, and use other predicates for synonyms, abbreviations, etc. In my experience is a bad idea to collapse the distinction between primary labels and synonyms, as you are doing, since I often find that (broad) synonyms overlap with some other primary label.

In ROBOT templates you can specify this primary label using LABEL or A rdfs:label -- they are equivalent, but the code expects just one of these, and it ends up using the last one it sees (implementation). Arguably this is a poor design for label column selection, but I doubt we can change it without breaking backwards compatibility. The best option I see is to warn/error when multiple primary label columns are defined.

The problem in most of the examples that you provided is that you are specifying both a LABEL column (which is usually filled) and a A rdfs:label column (which is often empty). You are expecting ROBOT to use the former column (which is a reasonable expectation), but ROBOT is using the latter. This leads to #1033, where ROBOT does not recognize the label 'Abbreviations': it thinks there is no label for obo:0001462 because the "Annotation" column for that row is blank.

So I believe the solution is as simple as choosing another predicate for your "Annotation" columns, and I believe that rdfs:label is not a good predicate to use for those annotations (i.e. abbreviations) in any case.

If this approach isn't suitable, you can just use CURIEs in your Manchester expressions, and not worry about the primary labels.

JosephAWhite commented 2 years ago

If you are unhappy with ROBOT templates, you may wish to consider other software such as DOS-DP or tabOTTR.

In order to interpret labels in Manchester expressions, ROBOT needs a mapping from label to ID. We follow the OBO convention, which is to use rdfs:label predicate for the primary label, prefer exactly one primary label per term, and use other predicates for synonyms, abbreviations, etc. In my experience is a bad idea to collapse the distinction between primary labels and synonyms, as you are doing, since I often find that (broad) synonyms overlap with some other primary label.

In ROBOT templates you can specify this primary label using LABEL or A rdfs:label -- they are equivalent, but the code expects just one of these, and it ends up using the last one it sees (implementation). Arguably this is a poor design for label column selection, but I doubt we can change it without breaking backwards compatibility. The best option I see is to warn/error when multiple primary label columns are defined.

The problem in most of the examples that you provided is that you are specifying both a LABEL column (which is usually filled) and a A rdfs:label column (which is often empty). You are expecting ROBOT to use the former column (which is a reasonable expectation), but ROBOT is using the latter. This leads to #1033, where ROBOT does not recognize the label 'Abbreviations': it thinks there is no label for obo:0001462 because the "Annotation" column for that row is blank.

So I believe the solution is as simple as choosing another predicate for your "Annotation" columns, and I believe that rdfs:label is not a good predicate to use for those annotations (i.e. abbreviations) in any case.

If this approach isn't suitable, you can just use CURIEs in your Manchester expressions, and not worry about the primary labels.

Hi James,

Thank you for helping with this. The ROBOT behavior is making sense now. I will change the templates accordingly and let you know the results.

Best, Joe White

JosephAWhite commented 2 years ago

Hi James,

You were right: changing rdfs:label to rdfs:comment allowed for creation of the classes, comments, and LABEL for all cases. Early on, I tried to make the abbreviations an equivalent class, but never got that to work. Maybe now it will.

Anyway, Thanks for your help. Cheers, Joe White

jamesaoverton commented 2 years ago

Ok, I'm glad it worked.

What I usually see is a term such as your obo:0001463 with a label "ATRIAL FIBRILLATION" and synonym "A FIB". I think it would be a mistake to define two classes, one for "ATRIAL FIBRILLATION" and one for "A FIB" (which would require two distinct IDs), and then call them equivalent.

In any case, in my work I would use the Disease Ontology term DOID:0060224 atrial fibrillation and not define a new term.

JosephAWhite commented 2 years ago

Ok, I'm glad it worked.

What I usually see is a term such as your obo:0001463 with a label "ATRIAL FIBRILLATION" and synonym "A FIB". I think it would be a mistake to define two classes, one for "ATRIAL FIBRILLATION" and one for "A FIB" (which would require two distinct IDs), and then call them equivalent.

In any case, in my work I would use the Disease Ontology term DOID:0060224 atrial fibrillation and not define a new term.

Hi James,

How do you create a synonym using the ROBOT templates? Do I change the predicate on the 4th column to 'EP %' ?

E.g.

CURIE Label Parent Synonym ID LABEL SC % EP % obo:0001462 Abbreviations
obo:0001463 ATRIAL FIBRILLATION Abbreviations A FIB obo:0001464 ATRIAL FLUTTER Abbreviations A FLUTTER obo:0001465 ALTERED MENTAL STATUS Abbreviations A M S obo:0001466 ARTERIOVENOUS Abbreviations A V

Joe

jamesaoverton commented 2 years ago

No, you do not want to use OWL Equivalent Classes or Equivalent Properties for synonyms.

OWL distinguishes between Object Properties, which define logical relations between terms and are used by automated reasoners, and Annotation Properties, which are non-logical labels and annotations of all sorts. For annotating a term with its synonyms you should use an Annotation Property.

In ROBOT templates, Annotation Properties all start with A followed by the CURIE or the label of the property to use. rdfs:label and rdfs:comment are Annotation Properties. Above I mentioned a few other Annotation Properties that are widely used for synonyms in the OBO community: oboInOwl:hasExactSynonym and IAO:0000118 'alternative term'. The OBO Metadata Ontology has more, and other vocabularies/ontologies have their own. Unfortunately, Annotation Properties for synonyms are not very well standardized across the semantic web.

You may find this documentation site useful: https://oboacademy.github.io/obook/. There is a focus on OBO, but most of the information is applicable more generally.