pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
306 stars 445 forks source link

Article metadata should reflect journal and issue information when the article was published #7527

Open AhemNason opened 2 years ago

AhemNason commented 2 years ago

Describe the problem you would like to solve As discussed in this issue, OJS does a poor job at handling historical name changes in article-level metadata. For example, if I changed the name of my journal in OJS settings, all generated citations for my older issues would display the new name of my journal, not the name of the journal as it existed at publication.

This is kind of a big deal!

Describe the solution you'd like Metadata related to the journal and issue should be saved to the article when the article is published. This includes the journal title and ISSN, and the issue volume, issue, year, publication date, and title.

Who is asking for this feature? Service providers, librarians

Additional information This issue is split from a discussion here: https://github.com/pkp/pkp-lib/issues/2505

What precise pieces of metadata need to be tracked at the article level which are not being tracked now? (ie - journal title and ISSN, what else?)

Journal title and ISSN are the big ones. Title-level metadata is very important. Any issue-level metadata would ideally be tied to an article as well. Like volume, issue, year, publication date.

For each piece of metadata, when should this data be "stamped" in the article?

On publication.

When this "stamped" data needs to be changed (due to a mistake or any other reason), should it require a special operation (like creating a new version) or can any editor edit it whenever they want?

I'd like to make a case for the ability to edit the publication record for an article from the article's metadata and only as a new version of the article. It's certainly possible someone could publish, say, their first issue with a typo in the journal title and want to fix it. I think that's ok. But, once something has been published, changing my journal-level metadata should not change the metadata for my already-published works. If I need to change them, I could do it from the article metadata page in the publishing workflow.

How will the system know when changes to stamped metadata are made, so that it can redistributed or resync metadata with third-parties?

I'm less clear on this. I know @ewhanson has been working on "stale" status for existing Crossref deposits for example. If you require a new version of an article to publish it under a different title, at least you can flag on the publication of a new version.

How will a journal recover from a history of bad data? (ie - when a journal has already published lots of articles with the wrong title or ISSN)

Well, that really depends on where they've sent it. I can use Crossref for example. Right now, if I change my journal title and ISSN but then move my journal to a new server or edit my file-path, I need to update the URLs for where those DOIs point. But if I update them with the new metadata in the journal and issn fields, I'm now overwriting the old record. This could be felt all over the place. A ton of services pull metadata from the Crossref API. ORCID, CRIS systems, Altmetrics... it's kind of an enormous list.

I would say it's far more likely, that people erase good metadata with new, bad metadata as a result of title-level fields being tied to this setting. Either way it's a big mess to clean up. Getting the right metadata to a key piece of open scholarly infrastructure like datacite or Crossref would be the best way to have those changes pushed to the systems that pull from their APIs.

It is worth a note that I raised with Crossref that probably users should be warned when they try to update the ISSN for an already-registered DOI. Journal articles do not commonly swap publications entirely, and a warning would probably be useful. Their support lead is going to write up a proposal and let me know how it goes.

NateWr commented 2 years ago

Thanks @AhemNason. Requiring a new version to make changes (or the unpublish -> change -> republish workaround) makes it easy on our end to identify a stale record for downstream deposits.

Let's think a bit more about issue-level metadata. My sense is that maybe the issue-level metadata shouldn't be saved to the article like journal-level metadata. A journal name might change in the OJS system, but would a journal change issue-level metadata after publishing an issue? And in such cases, shouldn't changes to issue-level data lead to changes in article metadata? Unlike journal-level metadata, I think there's no risk of this information falling out of sync over time due to legitimate historical changes at the journal.

How will a journal recover from a history of bad data?

I'm thinking more from the perspective of OJS. If I have hundreds of articles in my system published when the journal operated as JOURNAL_NAME_1900 and all of their metadata records are deposited to Crossref with JOURNAL_NAME_2021, how do I get the records corrected in OJS, so that my "how to cite" citation shows the correct name and I can redeposit my data with Crossref?

AhemNason commented 2 years ago

Right now, the only way to right the ship is to put your old title and ISSN into the settings and update the deposits that would match. Then you swap back to your new title and issn. It's definitely doable. And doesn't require creating new versions of galleys. The issue is more like... knowing you've done it in the first place.

But as it stands now, it's a very easy thing to overlook.

As for the issue-level stuff, I don't totally disagree. But right now it's easier to change title- and issue- level metadata than it is change article metadata.

lpanebr commented 2 years ago

I feel the key to solving would all those requirements would be to allow a journal to have multiple journal-meta records.[1]

Each issue record would be required to link to exactly one journal-meta record.

Each article record would be required to link to exactly one issue record.

So, one possible fix for the JOURNAL_NAME_1900, JOURNAL_NAME_2021 problem would be:

  1. Create two new journal-meta records with correct informations
  2. Flag the original journal-meta as cancelled or something
  3. Update all issue records linking each with the correct journal-meta record

This operation (edit issue) should be able to run a trigger that would allow for checking and updating stamped article information downstream.

Does that make sense? Could it be develop given current ojs model without breaking current installations?

[1] https://jats.nlm.nih.gov/publishing/tag-library/1.3/element/journal-meta.html

ifarley commented 2 years ago

Hey @AhemNason - here's that change we discussed to the status message returned for these journal articles that have their titles and ISSNs updated (we call that the journal cite ID in our internal admin system): https://gitlab.com/crossref/user_stories/-/issues/677.

Feel free to take a look at the request and add to the discussion in Gitlab.

NateWr commented 2 years ago

@fgnievinski left a useful comment in a related issue. In short, Crossref requires that a journal name lead to a change in ISSN. If a journal changes its name but does not change its ISSN, deposits will fail with an error that the ISSN/journal name are not compatible. More details and workarounds are provided in the linked comment.

lpanebr commented 2 years ago

great to see this moving forward. The issue description in the crossref gitlab is a delight.

nils-stefan-weiher commented 2 years ago

Hi everyone,

this would also be relevant for OMP especially for long standing Instances with older publications. At the moment changing the press name would lead to changes in all citations of older publications and DOI metadata.

We are in the process of changing the name of an OMP press and were already discussing the impact, so we were thinking along the lines of the proposal from @lpanebr:

So, one possible fix for the JOURNAL_NAME_1900, JOURNAL_NAME_2021 problem would be:

1 . Create two new journal-meta records with correct informations

  1. Flag the original journal-meta as cancelled or something
  2. Update all issue records linking each with the correct journal-meta record

Except for OMP with press and Book publication.

Did you consider these changes for OMP as well?

Thanks and best regards,

Nils Weiher

NateWr commented 2 years ago

Did you consider these changes for OMP as well?

We haven't considered OMP carefully, yet, but I suspect whatever approach we adopt for OJS will be rolled out to OMP as well.

nils-stefan-weiher commented 2 years ago

Thanks for the quick clarification @NateWr !

AhemNason commented 2 years ago

Just a note here as an update that in a conversation with Geoffrey Builder at Crossref, he identified this as a major issue with metadata coming from OJS in general. I can give a small example of where this could be rampant because I see it a lot in hosting.

We get a journal migrating to our service from another provider. That journal has had two titles in it's full run. In the migration, I need to update all the DOIs to the new location or they won't resolve. The URL registered needs to be updated. But, in updating all their records for the purpose of fixing the URL, I will also (unless I know to change the journal title and ISSN in OJS) change all registrations so that they are the same title. It's an easy thing to overlook when running updates. This is why I asked for at least the warning from Crossref, so that users would get some feedback that this is happening.

It's possible I'm writing this specifically because I'm about to manually update 30 years of DOIs for a hosted client. :)

ajnyga commented 1 year ago

I think we really need to have this solved since most likely many journals with past titles do not even realize they are creating bad metadata.

I think the best long term solution in OJS would be to change the whole architecture so that the Context level in OJS would be Publisher and that in OJS a Context would allow creating several Series/Journals like you can in a OMP Context. This would solve this title/ISSN problem easily while you would just have two series with different names and ISSNs with their own metadata along with the fact that this would align the codebase between the PKP applications. But since this approach has not gained much support I think we need as light a solution as possible.

My suggestion is:

What this approach does not take into account (and again the larger architectual change would by default) is how we present this title history for the readers in a clear way. Of course the metadata would now work like it should (the highest priority), but the journal would need to manually describe the title changes somewhere.

I also have some doubts whether the batch stamping tool or the open fields for adding data like publisher name or ISSN will lead to problems. We need to make sure the UI has clear instructions for the editors.

nils-stefan-weiher commented 1 year ago

@ajnyga During talks with colleagues at University Library Heidelberg we also had a similar short term solution in mind.

We also talked with @withanage about this and my colleague @nongenti is ready to contribute development time for this.

I also agree we need a short term solution and a long term architectural change for working with historical press/journal metadata.

We have the same problem with journals who migrate from a print-only journal to a digital one, and may have several name changes over the years of the existence of this journal. This also came up during discussion with staff from German National Library because of long-term archiving.

I am thinking of proposing this topic for the PKP Sprint in Hannover. Will you be there @ajnyga ?

gerwinC commented 1 year ago

It's good to see that this problem is being worked on! We host several journals that are affected by this.

withanage commented 1 year ago

@nils-stefan-weiher it is a good idea to discuss this in the sprint.

I had a chat with @asmecher about this and he pointed and to concentrate on the stand-based support would be the best way to go forward, if we are going to add it into the core.

https://github.com/pkp/pkp-lib/issues/2505

ajnyga commented 1 year ago

I agree that this is worth a talk in sprint. What the goals are during the sprint is of course another question. Maybe just make a clear plan and also consider omp. In omp it might be that just adding a publisher field to Series is enough. In any case because of the profound architectural difference (different level treated as Context) the solution in OMP can not be the same.

I will come to Hannover!