pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
304 stars 444 forks source link

Need to support multiple author affiliations #7135

Open kmccurley opened 3 years ago

kmccurley commented 3 years ago

Apologies if this is already present in another issue or an internal development plan. I searched in issues but could not find anything related to multiple affiliations for an author.

A recent study of 22 million articles published in 2019 showed that "almost one in three publications was (co-)authored by authors with multiple affiliations..." and "the share of authors with multiple affiliations increased from around 10% to 16% since 1996." The fact that OJS does not support multiple affiliations for authors means it is increasingly out of step with the realities of academic publishing, and my organization is reluctant to continue using OJS for this reason (among others).

I believe that an author should be able to specify multiple affiliations for a submission. This meshes nicely with the need to uniquely identify affiliations through the use of ROR identifiers. The use of a free text field alone for affiliations makes it difficult for machines to determine that "UC Berkeley" and "University of California - Berkeley" and "University of California, Berkeley" are in fact the same institution.

Having just installed the latest version of OJS, I noticed that affiliation information is stored in the underlying database as a row in author_settings using setting_name of affiliation, but the underlying table has a unique key of (author_id,locale,setting_name), which makes it impossible to store multiple affiliations unless the information is encoded in some way within the setting_value field. Our authors are currently listing multiple affiliations with ; to separate them, but this is a bad practice for the future (much like journals entering bogus email addresses when the field was required).

As I mentioned before, the use of multiple affiliations is already an extremely common practice. The listing of affiliations has multiple purposes, including citation analysis to rank institutions, which strongly affects their funding. The attachment of an affiliation to a paper also strongly influences the reputation of the paper itself, and the inability to list multiple affiliations contributes to a "winner-take-all" attribution of credit, which is damaging to second-tier institutions and their authors.

Accuracy of affiliations is also important for identifying potential conflict of interest among reviewers.

Multiple affiliations are already supported by the following:

  1. the schema for the native plugin XSD allows multiple affiliation tags per author, but when you import to OJS it appears to discards this information. Obviously export fails to report multiple affiliations since the database only holds one.
  2. the pubmed/medline XML format
  3. the doaj xml format
  4. the datacite xml format
  5. the dublin core supports multiple affiliations, but Recommentation 6 strangely says they should be associated to papers rather than authors. Dublin core is lagging in other things like ORCID IDs.
  6. the crossref api supports them (see their schema)
  7. the DOAJ schema supports it, but the DOAJ export plugin is limited to a single affiliation. They are now compatible with crossref.
  8. medra/onix but the medra/ONIX export plugin cannot supply them.

Obviously other publishers have already embraced multiple affiliations. ACM has started capturing structured representations in their LaTeX class:

\affiliation{%
\institution{University of New South Wales}
\department{School of Biomedical Engineering}
\streetaddress{Samuels Building (F25), Kensington Campus}
\city{Sidney}
\state{NSW}
\postcode{2052}
\country{Australia}})

This can be useful in case the affiliation does not have a ROR ID, or the author wishes to define it as within a department or institute of a ROR entity (ROR does not catalogue these).

Unfortunately the schema of having a single affiliation is buried deeply in the codebase for OJS. Obviously the core developers of OJS are best able to understand a path forward for addressing this. IOne possible interim solution is to define a new field in author_settings with setting_name of affiliationList. Then populate this with a JSON encoding that can have version information inside it. The code that uses $author->getAffiliation can over time be migrated to $author->getAffiliationList() to return a list of affiliations (perhaps with different locales!). An alternative is to allow author_settings to have multiple values for a given setting_name.

asmecher commented 3 years ago

(Somewhat cross-posted: https://github.com/pkp/pkp-lib/issues/5912#issuecomment-860452420)

@kmccurley, we often support a "dual-track" toolset for features in OJS:

For example, the built-in OJS search engine works out of the box, but is not very feature-rich and has limited scalability, but the Lucene/SOLR plugin is available for those who need the additional tools and have the capacity to run the necessary service.

I'm hesitant to make the current "affiliation" field any smarter than it currently is because of some inherent constraints:

I think machine-readability is a precursor to the work you propose, thus working with RORs directly (where you mentioned this before) or possibly via the ORCID API and the author's ORCID record.

For your own use case, are RORs or author ORCIDs a workable approach?

kmccurley commented 3 years ago

Let's stay focused on the issue at hand: multiple affiliations per author. That's a critical deficiency of the OJS schema for what is stored about an author.

The reason I mentioned the other issues is partly because OJS is falling behind, and when you make a schema change, you should anticipate all of the requirements for publishing. Machine-readable metadata is absolutely necessary for any serious publishing platform, because all reputation scores are based upon it, and research funding agencies are increasingly demanding it. As I mentioned earlier, almost every metadata format for publishing now supports multiple affiliations.

NateWr commented 3 years ago

As @asmecher says, we are likely to pursue support for multiple affiliations through extensions to the ROR plugin and/or the ORCID plugin. That's because these approaches offer the possibility to support affiliation disambiguation and machine-readability.

Our schema is extensible using plugins, which means that we can store a single affiiliation record by default:

University of Bern, University of Pisa

And plugins can enrich that by storing additional data alongside author records:

[
  {
    "ror": "03rjyp183",
    "name": {
        "en_US": "University of Bern",
        ".._..": "...",
    },
  },
  {
    "ror": "03y4dt428",
    "name": {
        "en_US": "University of Pisa",
        ".._..": "...",
    },
  }
]

That can then be used to enrich records sent to downstream consumers, like Crossref, Datacite, etc. Keeping the plain text field as a base provides flexibility that's important to satisfy all of the different use cases of our community, as you can see in this example.

kmccurley commented 3 years ago

I've spent some time reading the plugin documentation, looking at other plugins, and reading the core code. I've concluded that it's probably not trivial to write a plugin, and may not even be possible without modifying the core code. The problem is that there are many parts of the code that depend upon $author->getAffiliation() returning a string. The whole point would be to capture more sensible metadata, but that means every other plugin that exports metadata would need to be modified. That includes doaj, native, googleScholar, users, ROR, and perhaps others. This would introduce far too many dependencies between existing plugins.

I'm surprised that nobody has flagged this before, given the reality of publishing practice. It sounds like it won't happen soon and I should look toward developing our own alternative.

asmecher commented 1 year ago

Heads-up that we're likely going to be implementing multiple affiliation support in the core (OJS, OMP, OPS) as part of the work to integrate ROR support into the applications. The thinking goes like this:

kmccurley commented 1 year ago

This is good news. While you're making a change, it's worth thinking about why you collect affiliations at all. Possible reasons are:

  1. for reporting to institutions and funding agencies who want to track their publications. crossref provides search on this. This is where ROR becomes important, because free text search can have too many false positives and negatives (e.g., what is USC?).
  2. for readers to see the affiliation information on the website. This is where free-text becomes important, because there are a large number of relationships to institutions that can only be expressed in text (e.g., "work started while visiting" or joint appointments in two departments, etc).
  3. for alignment to other schemas that might be required in a publisher's workflow. Notable examples include JATS and crossref.

I'd particularly recommend looking over the JATS aff tag and the crossref institution tag to see how they structure affiliations and their identifier. They have thought carefully about how to structure this information.

This is also related to how authors express their funding relationships, which is different than an affiliation. crossref has announced that they will be transitioning from using their Open funder registry to ROR, so ROR identifiers will be useful there.

Devika008 commented 3 months ago

Hello,

Here's my proposal for adding multiple affiliations for authors and users in OJS. This includes support for both ROR and non-ROR affiliations.

You van view the workflow here: https://youtu.be/FHwF4yBwzEA

Some considerations:

  1. As the user types the name of the institute, suggestions from ROR-affiliated institutes will appear. ROR-affiliated institutes will be marked with the ROR logo. A URL symbol in the dropdown will link to the institute's page for verification.
  2. When a ROR-affiliated institute is selected, the multilingual/translations section will indicate that all translations are complete, as they will be pulled from the ROR database. These fields will be non-editable.
  3. After clicking "Add," the institute will be added, and a new row will appear, allowing the user to input another affiliation.
  4. If no suggestions appear while typing, the institute is non-ROR affiliated. The user can still input the institute, but they will need to provide translations for the multilingual section.
  5. Users can add as many affiliations as needed.

@asmecher @GaziYucel @bozana please add more considerations if I have missed any

kmccurley commented 3 months ago

I'm not a member of the OJS development team, but I think your UI looks quite nice.

The goal of any UI is to help the user complete the task at hand with minimal fuss. There are several side constraints to consider here:

  1. how much is preloaded in the browser? The ROR database is quite large (47MB for our JSON version). We started preloading a snapshot of minimal information, but even that proved to be too much so we switched to a server-side search index that responds very quickly (50ms or so) so we can hit it with every keystroke (and kill previous attempts that are still in flight). It has to respond fast (<100ms) or else it will annoy the user.
  2. We use a variant where our dropdown also shows acronyms and alt names for an organization. This can help the user to disambiguate the alternatives. If someone types UCSD, the response has the full name and acronym. A query like USC is really ambiguous so it helps to show more information to the user. Ideally you would select the language of the user, but in our case we only deal with English.
  3. The ROR database has relationships in it (children, parents, and related) and a variety of "alt names". It's not easy to show all of these in a dropdown, so people may choose the lowest common denominator. If you take the example of UCSD, it has six children organizations so they could be considered a match for the query UCSD.

One of our sites uses a dropdown but encourages user's to click on something to refine their query. We opted for this because we wanted to tell users how to encode the most accurate information possible into their LaTeX document. I think you could figure out how to merge this into the OJS workflow, but perhaps it's too complicated to drill down on the relationships of an organization. I think it just illustrates how complex the choice is for an author to select an affiliation from the ROR database.

mpbraendle commented 2 weeks ago

It's now three years since this issue has been opened, and still no progress. As @kmccurley pointed out in his OP, it is a clear requirement due to publishing practice, especially for journals in the medical and sciences field (and we have a lot of articles examples where we need to use a specific separator such as "; " to distinguish multiple affiliations and use the separator to split them e.g. for PubMed export).

This issue needs to relabelled to a major enhancement and a milestone should be set.

As long as this is not solved, I can't recommend to my teams to install the ROR plugin (although ROR is a fine solution for organization disambiguation).

asmecher commented 2 weeks ago

@mpbraendle, development on this is currently underway as part of the RoR integration into OJS 3.5. @GaziYucel, maybe you can share a couple quick details?

GaziYucel commented 2 weeks ago

Hi @mpbraendle, thank you for your interest.

As @asmecher pointed out above, I am currently working on this. Plan is to release this with the OJS 3.5 release.

You can view the workflow here: https://youtu.be/FHwF4yBwzEA

The PR where I referenced this issue is a part of the ROR / multiple affiliations integration into the core. This PR is solely to get the ROR dataset data dump into the OJS database. This will be used for lookups, because using the ROR api for lookups seems to slow. This will add approximately 40MB in the database, refreshed bi-weekly.

If you are interested in the development flow, this is the branch I am working on https://github.com/GaziYucel/pkp-lib/tree/multiple-author-affiliations

We decided to implement the new UI interfaces as you can see in the video, which we think is much better than before. This will make the interface more future proof and more accessible.