information-artifact-ontology / ontology-metadata

OBO Metadata Ontology
Creative Commons Zero v1.0 Universal
19 stars 8 forks source link

Unify dc:creator oio:created_by and dc:contributor, IAO:term editor #60

Open matentzn opened 3 years ago

matentzn commented 3 years ago

Is there any way we can, OBO wide, agree to

and

@cthoyt

cthoyt commented 3 years ago

I'm not familiar with the difference in semantics of dc:creator and dc:created_by. Does one refer to a resource and the other a literal? Because it would be great to refer to ORCID identifiers as resources.

Either way, 100% support using structured information as attribution. It's very disconcerting reading through such high quality resources and finding somebody's initials that take 2 hours to look up by reading old papers. This has happened to me in GO, MONDO, and others

matentzn commented 3 years ago

Thats because it was a typo.. Sorry about that. Fixed now. oio:created_by!

cthoyt commented 3 years ago

Okay, then rephrased: I'm not familiar with oio - but since DC is so ubiquitous, I'd vote for using that (unless the semantics of oio:created_by are more suggestive for relations between resources instead of just text)

matentzn commented 3 years ago

oio stands for oboInOwl and is basically the OBO format internal vocabulary namespace. You have been using oio:hasDbXref a lot!

cmungall commented 3 years ago

agree that dc:contributor should always refer to a valid orcid?

I think SHOULD not MUST is OK here but be prepared that there will be many violations. We have many ontologies that are decades old with contributions that predates ORCID. In some cases we have retrospectively tracked down historic contributors and rewired their contributor dbxref to an orcid, but this is not always possible. Many historic contributors still lack ORCIDs. I worry by saying SHOULD we generate a lot of busy work on resource poor ontologies that would be better spent elsewhere, or we just weaken the meaning of SHOULD to where it's meaningless.

matentzn commented 3 years ago

I would say SHOULD is good and we just agree on using orcids moving forward.. I don't think its busy work. If we could use this consolidated way of attributing to generate a dashboard that makes individuals contributions to ontologies other than their own more visible, this will be a very great incentive!

sbello commented 3 years ago

When adding terms in Protege if you use the new entities metadata settings to automatically add creator and date information to new terms, the default setting is for creator (see image). If we are not going to settle on 'creator' it would be good to ask protege to change the default setting to whatever we settle on. creator_metadata

matentzn commented 3 years ago

@sbello thanks! Yes! And what would be even better if the protege config was a separate config file that could be reused across obo.. We are contemplating something like that at the moment!

matentzn commented 3 years ago

In reference to #76 maybe we should first gather the use cases for attributing terms.

I want to emphasise one more time how strongly I feel about OBO being a driving force in world-wide ontology standardisation efforts beyond the biomedical domain, and to do that, we need to cut back on some of our silo annotation properties in the OIO and IAO vocabularies in favour of more widely used ones, like dublin core, skos, void, and friends. Please open a new issue: "We should not re-use external vocabularies if it means even the slightest compromise" and provide your arguments to convince me otherwise. So yes, standardisation means that we may lose some subtle distinctions.

Here is how I would suggest we use the creation vocabulary. Please tell me what you think.

I am not saying to change all legacy annotations now to this: I am saying, let's find a standard we can use moving forward, or agree that standardising this is not worth the cost.

StroemPhi commented 3 years ago

Wouldn't the semantics behind IAO:0000117 be sufficiently provided, if each term has it's own issue (using IAO_0000233 - term tracker item) that is properly assigned to be handled by the "term editor(s)"?

matentzn commented 3 years ago

I totally agree. I would love making this standard habit, tagging all new terms with their respective github issues.. It would create a layer of indirection, for obtaining the "responsible editor", but I think this much better than using non standard properties for something like that..

sbello commented 3 years ago

@matentzn can these annotations be added when using ROBOT templates? I like the creator/contributor/source trio ideally in combination with an ORCID but it would be helpful if I could include this information in ROBOT templates for bulk addition. Would it be as simple as adding columns for this attributes?

matentzn commented 3 years ago

Absolutely no problem! :)

zhengj2007 commented 3 years ago

@matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

matentzn commented 3 years ago

I get it now @zhengj2007 thanks! But perhaps that is secondary. In this case of ambiguity, you could simply use dc:contributor which is certainly true, right?

bpeters42 commented 3 years ago

I like Nico's breakdown, and would add to it that essentially dc:creator is_a dc:contributor. And the way we have been using 'term editor' is essentially what dc:contributor is. Furthermore, it can be hard / unfair to try to distinguish who is the creator, in so far as sometimes a term gets added to an ontology with placeholder (or empty) definitions etc. by person A, and person B puts in a lot more effort providing those. So I would favor just sticking to dc:contributor by default.

On Wed, Nov 3, 2021 at 6:51 AM jie zheng @.***> wrote:

@matentzn https://github.com/matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/60#issuecomment-959129077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IT7IE7EL27JWECYL4LUKFD6LANCNFSM4Y2HJMDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

bpeters42 commented 3 years ago

Emails crossed.I was essentially trying to say the same things as Nico.

On Wed, Nov 3, 2021 at 7:06 AM Bjoern Peters @.***> wrote:

I like Nico's breakdown, and would add to it that essentially dc:creator is_a dc:contributor. And the way we have been using 'term editor' is essentially what dc:contributor is. Furthermore, it can be hard / unfair to try to distinguish who is the creator, in so far as sometimes a term gets added to an ontology with placeholder (or empty) definitions etc. by person A, and person B puts in a lot more effort providing those. So I would favor just sticking to dc:contributor by default.

On Wed, Nov 3, 2021 at 6:51 AM jie zheng @.***> wrote:

@matentzn https://github.com/matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/60#issuecomment-959129077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IT7IE7EL27JWECYL4LUKFD6LANCNFSM4Y2HJMDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

matentzn commented 3 years ago

Yes, I agree with that as well.. dc:contributor should be the default, and, realistically given that ontologies are always a massively collaborative effort, I would even agree to a motion that gets rid of dc:creator altogether. Thank you @bpeters42 for your input :)

sbello commented 3 years ago

It looks like I can change the user metadata in protege to use whatever relation we decide in the creator property field. So, if the group wants to go with contributor instead of creator I'm fine with that.

wdduncan commented 3 years ago

FWIW, I've been setting the "New entities metadata" to "Use user name.

image

But, in my "User details" setting, I include my name and ORCID.

image

I like having both a name and ORCID, since I don't have people's ORCIDs memorized.

cmungall commented 3 years ago

I agree with Nico's recommendations.

What this doesn't address is how this interacts with definition level axiom annotations (done using owl reification). It's very common on many ontologies to provide as provenance for a definition some mix of primary, secondary, tertiary sources, individuals, and groups of people.

How should this interact with term-level source and contributor annotations?

  1. Favor term-level annotations over axiom-level
  2. Favor axiom-level and only include term-level if non-redundant
  3. Have redundancy in the release version, non-redundancy in edit version, and a standard sparql update to propagate selectively from axiom-level to term level as part of the release process
  4. No recommendation. Every ontology does this as it pleases

I favor 3, and disfavor 1, it is important for many ontologies to have the provenance at the axiom level.

graybeal commented 3 years ago

I think I agree, it isn't clear what IAO:0000117 (term editor) adds to the others, nor which of the others it truly represents (but I infer 'creator' from the description), and therefore it is less helpful to the average non-OBO user. (if that's a user you're trying to reach, that's a good thing I think.)

Some nuances in case they are useful.

Is making at least one dc:contributor required, but making dc:creator optional consistent with both your idea of compromise and the previous comments?

Note there is no reason people and institutions can't both be contributors/creators/etc on one term. Right?

presumably dc:source can also be a place (location on the web), not just a person or group.

I think you've dropped a few person identification systems that have some scientific following and are LOD-friendly (FOAF, VIVO). Whereas I'm not sure why you'd include 4 through 7, given this is a future-looking recommendation.

matentzn commented 3 years ago

I agree full-heartedly with your assessment @graybeal , the reason why I added these three purely because I want to void pushback from GO which has used 4-7 for 30 years and will now be resistant to retro-curate all the various cjms and others to orcids.. Maybe I will volunteer doing it for them one weekend - if we can agree that orcid is the preferable identification. If someone has no orcid, I would follow the radical @cthoyt method of simply creating an entity on wikidata and use that, and I prefer that then to use FOAF or VIVO, because we know easily how to edit it. But, yes, FOAF and VIVO would still be better than 4-7.

matentzn commented 2 years ago

Related to https://github.com/information-artifact-ontology/ontology-metadata/issues/2

wdduncan commented 2 years ago

@cthoyt Does it help your script if I reverse how I do my dc:creator annotations so that the orcid comes first? E.g.:

https://orcid.org/0000-0001-9625-1899 (Bill Duncan)
matentzn commented 2 years ago

It is not @cthoyt script that is the only problem: we want to simply aggregate contributions across all ontologies using sparql. The labelling approach you chose is not well defined, everyone will do it differently. If we want human readable editor names as well, we should provide a map in the ontology header.

wdduncan commented 2 years ago

What is not well specified about the example? Are you wanting something like a regex? How about this:

{orcidid} *({first-name last-name},+)

I.e.: an orcid followed by an optional set of one or more comma delimited names contained within parenthesis.

I don't like the idea of putting a map in the header. It makes people go looking for the name associated with the orcid.

cthoyt commented 2 years ago

I agree with Nico; there's no useful, machine-readable attribution via dc:creator that isn't structured by directly and only using the IRI for the ORCID record.

I can't see how adding a mini-language within the OWL spec would be helpful, I'd strong disagree with anything that isn't just using the ORCID IRI for attribution purposes.

With regard to ease of access to human-readable names for contributors, I think that's a different conversation that has to happen somewhere else at a later time, after first getting a consensus that people would generally actually use this human-readable metadata

matentzn commented 2 years ago

First-name and last name is super error-prone no matter what (what about middle initials, special chars for Spanish etc). I am thinking more of how this would feed into knowledge graphs like wikidata, where your name would be a label on your ID, and how all the information OBO generates get connected to the wider world - the orcid here really is not a literal, but a node in a graph. We can always generate Human readable labels from the ORCIDs automatically if people would ask for it, and connect that using a different property.

wdduncan commented 2 years ago

What is the benefit vs cost of using only orcids? In what I proposed you (or a script) can ignore ignore whatever comes after the orcid. People (or at least me) read names, not orcids.

I'm trying to find a compromise. But, you don't seem open to such a compromise.

matentzn commented 2 years ago

Just think of it that way: an ORCID is the ID of a person. Would you recommend the ID of Limb in Uberon to be "http://purl.obolibrary.org/obo/UBERON_123 (limb)"? Again, the compromise is to have a well defined second property that generates human-readable contributor statements..

wdduncan commented 2 years ago

Where does it say that the dc:creator can only have orcids as values?

There is also the term editor annotation. Do you also want to restrict it in the same way? Again

matentzn commented 2 years ago

You are right, there is no rule restricting the range of dc:creator. I just have a use case I want to implement, which is to accurately aggregate contributions across OBO ontologies. So I want to be able to, write a sparql queries that counts all the terms you have contributed to. For that, anything beyond the orcid will lead to inconsistencies. I would be ok to repurpose the term editor relation to do something like what you are proposing though, basically saying that term editor is the human readable variant of dc:contributor.

wdduncan commented 2 years ago

Or you could propose a new annotation that is defined to only take orcids as values. That way there would be no special re-defining of annotations that are already used.

alanruttenberg commented 2 years ago

You can put a label on an ORCID as an annotation. I'm presuming the IRIs are being used. Yes, first and last name isn't universal. So don't mandate it. Allow the label to be something useful.

On Mon, Dec 20, 2021 at 1:07 PM Bill Duncan @.***> wrote:

What is the benefit vs cost of using only orcids? In what I proposed you (or a script) can ignore ignore whatever comes after the orcid. People (or at least me) read names, not orcids.

I'm trying to find a compromise. But, you don't seem open to such a compromise.

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/60#issuecomment-998155343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDX4Z42RKIH5ZAYTQB3UR5WFJANCNFSM4Y2HJMDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: <information-artifact-ontology/ontology-metadata/issues/60/998155343@ github.com>

matentzn commented 2 years ago

The reason why I want to use dc:contributor is because it is an international standard, and I want OBO terms to be queryable in Wikidata and similar using this property, and the terms we added connected to our Wikidata records through our orcid. We could label the orcids as @alanruttenberg suggests. In any case, this is not for me, or you, to decide - there is no point in me mandating it and no one caring and implementing it. So don't worry. Maybe the proposal does not fly, and that's that. But to achieve what I want to achieve, which is machine-unambiguous attribution analysis, there is just no alternative than to use a standard property (that wikidata understands) and an identifier as range.

alanruttenberg commented 2 years ago

I'm all for using the properties when they make sense. dc:contributor makes sense and we've used it. However, term editor is more specific than contributor and shouldn't, IMO, be lumped into the bucket. Making term editor a sub-annotation property of dc:contributor would make sense and ought to yield a similar result.

On Mon, Dec 20, 2021 at 1:44 PM Nico Matentzoglu @.***> wrote:

The reason why I want to use dc:contributor is because it is an international standard, and I want OBO terms to be queryable in Wikidata and similar using this property, and the terms we added connected to our Wikidata records through our orcid. We could label the orcids as @alanruttenberg https://github.com/alanruttenberg suggests. In any case, this is not for me, or you, to decide - there is no point in me mandating it and no one caring and implementing it. So don't worry. Maybe the proposal does not fly, and that's that. But to achieve what I want to achieve, which is machine-unambiguous attribution analysis, there is just no alternative than to use a standard property (that wikidata understands) and an identifier as range.

— Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/60#issuecomment-998180563, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDTND43R2EMSG6BVDWLUR52RRANCNFSM4Y2HJMDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: <information-artifact-ontology/ontology-metadata/issues/60/998180563@ github.com>

wdduncan commented 2 years ago

You can also have the ORCID as an annotation on dc:creator. E.g.:

  ex:foo a  owl:Class ;
             dc:creator "someone name" .

[ a                      owl:Axiom ;
  ex:orcid           "0000-0000-0000-0000" ;
  owl:annotatedProperty  dc:creator;
  owl:annotatedSource    ex:foo ;
  owl:annotatedTarget    "someone name"
] .

Not sure what is wrong with the proposal of having the dc:creator use the format: <orcid> <name>. In SPARQL, you can split the orcid and name parts, and the name part can be ignored. The name part is just for us humans to read.

cmungall commented 2 years ago

Let's avoid string parsing in sparql, and let's use IDs/nodes universally rather than string literals

On Fri, Dec 31, 2021, 11:50 AM Bill Duncan @.***> wrote:

You can also have the ORCID as an annotation on dc:creator. E.g.:

ex:foo a owl:Class ; dc:creator "someone name" .

[ a owl:Axiom ; ex:orcid "0000-0000-0000-0000" ; owl:annotatedProperty dc:creator; owl:annotatedSource ex:foo ; owl:annotatedTarget "someone name" ] .

Not sure what is wrong with the proposal of having the dc:creator use the format:

. In SPARQL, you can split the orcid and name parts, and the name part can be ignored. The name part is just for us humans to read. — Reply to this email directly, view it on GitHub , or unsubscribe . Triage notifications on the go with GitHub Mobile for iOS or Android . You are receiving this because you commented.Message ID:
cmungall commented 1 year ago

There are a lot of orthogonal issues being discussed here.

I have tried to separate these out into actionable proposals that can be voted on:

If someone wants to make issues for other aspects covered here (e.g whether to put labels on axiom annotations) then go ahead!