skg-if / interoperability-framework

0 stars 0 forks source link

Discuss "raw string" expressing an affiliation to an organisation #5

Open andremann opened 3 weeks ago

essepuntato commented 2 weeks ago

We need to understand carefully which "affiliation" we are referring to. Indeed, we have different positions where affiliation can be declared via "contributions" and "relevant_organisations" in the research product and via "affiliations" in the agent. In principle, we could easily add description properties to "contributions" (research product) and "affiliations" (agent). However, it is unclear why not to use the structure we already have right now, e.g., by creating a "fake" ID and assigning the name.

Example: If we have to represent an organisation (e.g. "Killing Farmaceutical, USA") for which we do not have any ID specified, we could add it anyway by using the existing specification (by creating a fake "local_identifier" for it), e.g.:

{
    ... 
    "relevant_organisations": [
         {
             "local_identifier": "fk21",
             "name": "Killing Farmaceutical, USA"
         }
    ]
}

Note that the "local_identifier" is necessary, in particular, if we need to connect this organisation via different properties (e.g. contributions/declared_affiliation and relevant_organisations, when for the same research product, we need to specify the same affiliation in both).

If, instead, the problem is to preserve the exact way an affiliation is written in an article, then a new attribute should be added. The question here is: what is the use case? Is it strictly necessary for SKG-IF to have it included? Would it be better to address it via an extension (that could be applied also to other entities, such as agent been author of a research product)?

rduyme commented 2 weeks ago

@essepuntato @andremann

It took openalex a few model/api iterations to get to a convenient affiliation structure for publications ex: https://api.openalex.org/works/doi:10.1088/1367-2630/10/5/055012 image

we are mixing publication affiliation which always starts with a raw string (that can be matched to an organisation or not) and person employer affiliation

Note : I think On OpenAlex they derived the person employer affiliation from Organisation matched in publications. ex: https://api.openalex.org/people/A5073458787 (there is no raw_affiliation on "person" entity, raw_affiliation is only on "work" entity) important : You can have multiple organisations associated to a single raw_affiliation

It is the same for raw_author_name image

The raw_author_name and raw_affiliation_string were not present in first openalex iterations. Now that they added them it their work entity, it is much more easy to understand. raw_author_name and raw_affiliation_string is the data for example directly coming from crossref.

If OpenAire or OpenAlex decide to crawl our data repository, we need to give them the same level of accuracy on our fields.

The real data, is most of the time this : "Sleep and Human Health Institute, Harvard University, 221 Longwood Avenue, Boston, MA, USA" (https://ror.org/04r5ess67 + https://ror.org/03vek6s52 )

Regarding "relevant_organisations". This notion seems to be created for OpenAire to add entitites related to the publisher (i suppose guessed from the owner of the oaipmh endpoint, ex: a university ). ex: https://explore.openaire.eu/search/publication?pid=10.1063%2F5.0182504 Ok but... these related organisations are actually inferred by OpenAire from Oai_pmh owner raw name (I suppose, I may be wrong ;-) ), it is the same for OpenAlex that is inferring Organisation entities from publication raw_affiliations_strings. It is not clear how the SKG-IF is dealing with the inferred entities ( Organisation, Topic )

rduyme commented 2 weeks ago

On crossref the "name" seems to be usable for the full address or simple name. https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/affiliations/

ex: http://api.crossref.org/works/10.1021/bi901864j (full address)

"affiliation": [
          {
            "name": "Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QA, U.K."
          }
        ]

http://api.crossref.org/works/10.31234/osf.io/75ebc

"affiliation": [
          {
            "id": [
              {
                "id": "https://ror.org/05f950310",
                "id-type": "ROR",
                "asserted-by": "publisher"
              }
            ],
            "name": "KU Leuven"
          }
        ]

http://api.crossref.org/works/10.14232/phd.10914

 "affiliation": [
          {
            "id": [
              {
                "id": "https://ror.org/01pnej532",
                "id-type": "ROR",
                "asserted-by": "publisher"
              }
            ],
            "name": "University of Szeged",
            "department": [
              "Doctoral School of Clinical Medicine"
            ]
          }
        ]

On this last data example crossref the department data ""Doctoral School of Clinical Medicine" is lost by OpenAlex ... https://api.openalex.org/works/doi:10.14232/phd.10914

 "affiliations": [
        {
          "raw_affiliation_string": "University of Szeged",
          "institution_ids": [
            "https://openalex.org/I227486990"
          ]
        }
      ]

Question is simple : Where will crossref put their "department" in our model ?

rduyme commented 1 week ago

Another contribution on this topic ... We are starting destroying our SKG ... :-) image

essepuntato commented 5 days ago

Hi @rduyme, thanks for your comments. While we could, in principle, expand the model largely, I think the examples you presented can be addressed well in the current SKG-IF data model. Organisations (behind affiliations) can be attached in two different locations in the current Interoperability Framework, in particular:

  1. if you want to associate the affiliation listed in a paper without keeping track of the authors having them, then relevant_organisations would be the way to go;
  2. if you want, instead, to associate the affiliation to an author of a specific paper, you can use the term declared_affiliations that enables such an interlink between the paper, the person being an author and its related affiliation(s).

The raw affiliation string, if needed, can be provided via the attribute name, as in Crossref. That was our rationale.

My point is instead the question behind the need for such a raw string—that, please note, applies to any agent. To me, but maybe I am biased here, the raw string is essential when such an agent is not identified by an external identifier (such as an ORCID, a ROR, or whatever). In that case, the raw string becomes very important since it is the only way to recognise (to some extent) which agent we are talking about. The goal of SKG-IF was to offer a standard format to exchange information across systems, focusing on this first stage of SKGs providing bibliographic metadata and citation data of research products. That was a pragmatic decision since, as the primary use case, we initially started from there (and by involving such sources as OpenCitations, OpenAIRE, OpenAlex, Crossref, DataCite, etc.).

So the question is: How is the use of name to exchange raw strings affecting (negatively) the exchange of these data?

rduyme commented 4 days ago

Hi Silvio,

"So the question is: How is the use of name to exchange raw strings affecting (negatively) the exchange of these data?"

I opened a ticket at crossref, please read it :-) . "name" is really semantically confusing for people implementing crossref api. https://community.crossref.org/t/ror-affiliation-raw-affiliation-in-american-physical-society-aps-publisher-publications/12450 . (same ticket is open at OpenAlex for information)

The main problem is that we (the IT guys) probably disagree on the following points :

I am really concerned about this current "raw affiliation" open data being not properly opened anymore, if things are not done right. To be honest, I hope having a response from Crossref and OpenAlex tech guys before taking any decision on SKG-IF.

PS : Back to my APS screenshot example "KU Leuven" university is not in the list of institutions affiliated with this publication 10.1103/PhysRevC.110.034315 : in OpenAlex but also in WoS... => If we share affiliation what is it for ? if it is not to properly acknowledge universities ?