jam-schema / jams

Journal Article Metadata Schema
Creative Commons Zero v1.0 Universal
32 stars 5 forks source link

authors/institution PID (orcid and co) discussion #20

Closed jcolomb closed 3 years ago

jcolomb commented 3 years ago

from @nathanlesage

Alternative suggestion for the IDs: Instead of defining an arbitrary top-level-property, and nesting the properties type and id below that, we could capture the JATS XML contrib-id with corresponding contrib-type with top-level properties instead. That is (e.g.):

author: surname: Mustermann given-names: Max orcid: 0000-0002-3127-5520 academia-id: https://academia.edu/link/to/profile # Just an arbitrary example scopus-id: 7007156898 # Taken from https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/contrib-id.html

This might work better, as it …

… reduces the mental load for authors writing the frontmatter
… contains the same amount of information
… is simpler and easier for the parsers

and again : As already stated in my review of your PR, I think a good and resource-friendly approach would be to simply define all different possible types of IDs as properties so we don't have to use a top-level property PID, and instead can outsource some of the work to the parsers:

authors: orcid: 0000-0000-0000-0000 id: a-generic-id

This would also mirror the JATS standard, but instead of defining a top-level-property to group the attributes and values, it captures the attributes within the property-name (this would be also cleaner from a technical perspective, as the attribute of an XML-tag is strictly speaking part of the tag).

Furthermore, I think anchors are independent of our endeavours, so I think "id or anchor" might lead to confusion …?

jcolomb commented 3 years ago

For me the critical part here is

define all different possible types of IDs

That is not possible, as some IDs may not exist yet.

Jats went with type+id (one in the tag, one as data), which is probably safer on the long term.

In both cases though, and on the other hand, each type should be defined externally anyway, right?

jcolomb commented 3 years ago

Going one way or the other may be linked to #21. I would really like to discuss this in detail during the next call.

nathanlesage commented 3 years ago

For me the critical part here is

define all different possible types of IDs

That is not possible, as some IDs may not exist yet.

Jats went with type+id (one in the tag, one as data), which is probably safer on the long term.

In both cases though, and on the other hand, each type should be defined externally anyway, right?

I had the same fear that this might grow out of hand, but in the case of IDs, I think enumerating every possibility in the standard itself has two benefits: for one, we can limit the growth of ID systems and help enforce a central group of IDs (DOIs wouldn't be worth anything if hundreds of arbitrary systems were allowed as well), and second, it helps make this visible.

The main problem we face when attempting to incorporate JATS into YAML is that XML has attributes, YAML doesn't, and there are two ways to mitigate this: either we create arbitrary parent properties and convert the attributes to the same level of importance as values (which might induce problems to differentiate attributes from values), OR we incorporate attributes into the property name itself (hence effectively enumerating every possibility). This is not something I would recommend for every attribute, but in this specific instance, it makes sense in my view, based on the aforementioned arguments!

jcolomb commented 3 years ago

After live discussion, we are going to follow the jats4r way and keep type + id.

In brief, the idea is that we can still restrict the type of id pandoc will support if the group anytime think it is accurate. But this may be more in the hands of publishers than in ours.