icatproject / icat.server

The ICAT server offering both SOAP and "RESTlike" interfaces to a metadata catalog.
Other
1 stars 5 forks source link

Alternative option for Affiliation #260

Closed RKrahl closed 2 years ago

RKrahl commented 2 years ago

This provides a slightly modified option for the implementation of the Affiliation table with respect to #256.

Background is that in an earlier implementation, the name attribute was set to a STRING[1023]. This caused problems, because name is part of the uniqueness constraint and thus needs to be included in an index in the database. But STRING[1023] is too large to be added to an index. In #256 this is solved by shortening name to STRING[511].

The present PR suggest a different solution by splitting name in two attributes. In fact, in the version provided by #256, name serves a double purpose: it is used to disambiguate the affiliation entry in the case that one user has more than one affiliation in a publication and it sets the text to be displayed on the landing page and to be included in the publication metadata. This PR proposes two attributes: name is purely internal for the purpose of disambiguation and fullReference will set the text to be set in the visible metadata.

As a result, Affiliation will look like:


Affiliation

The home institute or other affiliation of a user in the context of a data publication

Uniqueness constraint: user, name

Relationships:

Card Class Field
1,1 DataPublicationUser user

Other fields:

Field Type Description
name String [255] NOT NULL An internal name for that affiliation entry, possibly the organization name
pid String [255] Identifier such as ROR or ISNI
fullReference String [1023] The full reference of the affiliation, optionally including street address and department, as it should appear in the publication

This has the following advantages:

The only disadvantage I can see is that it adds another attribute.

We briefly discussed this in the November collaboration meeting, but decided we would need more time for discussion. I submit this as a separate PR in order to open a space for this discussion.

RKrahl commented 2 years ago

To illustrate this with a real world example: with the implementation proposed here, the affiliations for the first author could be set as:

[{'fullReference': 'Optics for Solar Energy, Helmholtz-Zentrum Berlin für Materialien und Energie, Albert-Einstein-Straße 16, 12489 Berlin',
  'name': '01: HZB',
  'pid': 'ROR:02aj13c28'},
 {'fullReference': 'Computational Nano Optics, Zuse Institute Berlin, Takustraße 7, 14195 Berlin',
  'name': '02: ZIB',
  'pid': 'ROR:02eva5865'}]

The prefix 01: and 02: in the name attribute respectively would guarantee the proper ordering in the display in the landing page.

kevinphippsstfc commented 2 years ago

I'm not entirely comfortable with the dual use of the name field regarding it also being used for ordering. Would it be better to put this in a separate "orderKey" field like has been done in DataPublicationUser?

RKrahl commented 2 years ago

I'm not entirely comfortable with the dual use of the name field regarding it also being used for ordering. Would it be better to put this in a separate "orderKey" field like has been done in DataPublicationUser?

How the new entity classes are actually being used in practice and whether they use the name attribute in Affiliation to establish a well defined order is up to the facilities to decide. What I illustrated in my example above was that the name attribute could be used in this way, a feature that we don't have in the schema version as implemented in #256.

As the result of adding fullReference, the remaining purpose of name is to disambiguate multiple affiliation entries of a user in a given data publication. The actual value essentially doesn't matter as it is probably never exposed, neither on the publication landing page, nor in the DataCite metadata.[^1] It seems a little exaggerated to me to add two attributes, name and orderKey, whose actual value doesn't matter other than that it differs and may define a particular order.

We could also rename name to orderKey if that makes you feel more comfortable.

[^1]: Again, it is up to the facilities to decide, how they design their landing pages and whether they expose Affiliation.name there. But the schema in the present PR is designed such that there is no need or compelling reason to expose it.

kevinphippsstfc commented 2 years ago

OK, I'm happy to go with your implementation as is (no need to rename name to orderKey). You have clearly done a lot more thinking about the specifics of how this is going to be implemented than I have, so I trust your judgement.