pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 443 forks source link

Multilingual articles (not just multilingual metadata) could receive multiple DOIs #7335

Open fgnievinski opened 2 years ago

fgnievinski commented 2 years ago

Describe the problem you would like to solve Sometimes journals publish a given article in multiple languages, i.e., with a separate PDF for each language.

Describe the solution you'd like In this case, CrossRef recommends depositing multiple DOIs, one for each language:

"When a single journal article is published in two languages, each should be assigned its own DOI. (...) A good way to remember our best practice is to note that DOIs are “citation identifiers” not “work identifiers.”"

Optionally, the secondary or alternative languages might point to the primary or original language:

"The original language instance has metadata that contains no indication of the translation instance. The alternative language instance includes in its metadata a relation to the original language instance."

Who is asking for this feature? Journal editor

Additional information This is not simply metadata in multiple languages, with a single PDF for the primary language. For that different scenario, OJS doesn't have much to do, because CrossRef still doen't support multilingual metadata for a given DOI. (Although they're working on it: https://trello.com/c/DgHbXntr )

ajnyga commented 2 years ago

The DOI we register is one that points to a landing page, not to any particular translation. You can add DOIs to representations as well, meaning files attached to a particular article, but that is probably a different case from their point of view.

If we would have a DOI for each translation, we would need a landing page with an unique url for each translation. See https://github.com/pkp/pkp-lib/issues/699

Related issue: https://github.com/pkp/pkp-lib/issues/6172

fgnievinski commented 2 years ago

But when the metadata is exported for deposit in CrossRef, the DOI registered by OJS adopts the submission primary language.

The proposal would allow exposing multilingual metadata for a scenario currently supported by CrossRef: multilingual works galleys.

Granted, this is probably more of a feature request for the OJS Crossref plugin than OJS itself.

ajnyga commented 2 years ago

Sure, but if there would be a landing page for all translations OJS could register a DOI for all languages, not just the submission primary language. And as it was mentioned in your link, there would be a intra_work relation between those works.

Galleys and Galley DOIs are handled as Components in Crossref schema. An article DOI should always point to a landing page. I suspect that the Crossref representative is not talking about Components in his post, but probably something that @AhemNason can discuss with them.

edit: of course if defining translations as Galley files in Components is enough for them, then OJS pretty much already supports this. You can already create DOIs for Galleys and these are included in the deposit. Not sure how well the relations, language params etc. are defined.

fgnievinski commented 2 years ago

"OJS could register a DOI for all languages, not just the submission primary language"

except CrossRef doesn't support multilingual metadata yet, it's still in their "research & planning" stage, we're probably a few years away from it, as per second link above.

the present proposal would for the first time ever allow multilingual publications to be exposed for indexing outside of OJS. and it's for a specific use case which CrossRef already supports and endorses, as per first link above: when the full text content is multilingual, not just the multilingual metadata of monolingual content.

currently all the secondary language metadata entered in OJS apparently is ignored by machines, it's only for human consumption. editors are translating and managing multilingual metadata with hopes it'd bring more visibility to their publications.

ajnyga commented 2 years ago

except CrossRef doesn't support multilingual metadata yet, it's still in their "research & planning" stage, we're probably a few years away from it, as per second link above.

Sure and having a language specific landing page in OJS is probably as far away.

As I mentioned above, Galleys are treated as Components in the Crossref Schema. If registering DOIs for Galleys and adding a isTranslationOf relation is enough then I think that would be fairly easily achieved. But The Component element to my knowledge does not include much metadata itself. I have been under the impression that something more would be needed, but I am fairly sure that PKP has already discussed this issue with Crossref.

In general I am interested in solving this since we have some journals that publish in two languages. Some of them simply publish the same article as two individual submissions in OJS and this gives them language specific DOIs.

ajnyga commented 2 years ago

This is how I have been thinking how different language versions of articles would get DOIs, but this requires OJS to have language specific landing pages. This example has two DOIs being deposited for the same article (English as primary language and Finnish as the translated version). In OJS this would be one Submission but a case where the full text and metadata would be in two languages. There are of course open UI questions involved on how to deal with this in the OJS landing page.

<?xml version="1.0" encoding="utf-8"?>
<doi_batch xmlns="http://www.crossref.org/schema/4.4.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" xmlns:ai="http://www.crossref.org/AccessIndicators.xsd" xmlns:rel="http://www.crossref.org/relations.xsd" version="4.4.0" xsi:schemaLocation="http://www.crossref.org/schema/4.4.0 https://www.crossref.org/schemas/crossref4.4.0.xsd">
  <head>
    <doi_batch_id>_1632587050</doi_batch_id>
    <timestamp>1632587050</timestamp>
    <depositor>
      <depositor_name>te</depositor_name>
      <email_address>te@tst.fi</email_address>
    </depositor>
    <registrant>Public Knowledge Project</registrant>
  </head>
  <body>
    <journal>
      <journal_metadata>
        <full_title>Journal of Public Knowledge</full_title>
        <abbrev_title>publicknowledgeJ Pub Know</abbrev_title>
        <issn media_type="electronic">0378-5955</issn>
        <issn media_type="print">0378-5955</issn>
      </journal_metadata>
      <journal_issue>
        <publication_date media_type="online">
          <month>09</month>
          <day>23</day>
          <year>2021</year>
        </publication_date>
        <journal_volume>
          <volume>1</volume>
        </journal_volume>
        <issue>2</issue>
      </journal_issue>

      <journal_article language="en" xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" publication_type="full_text" metadata_distribution_opts="any">
        <titles>
          <title>Genetic transformation of forest trees</title>
        </titles>
        <contributors>
          <person_name contributor_role="author" sequence="first" language="en">
            <given_name>Diaga</given_name>
            <surname>Diouf</surname>
          </person_name>
        </contributors>
        <publication_date media_type="online">
          <month>09</month>
          <day>25</day>
          <year>2021</year>
        </publication_date>
        <doi_data>
          <doi>10.1234/jpkjpk.5</doi>
          <resource>http://localhost/ojs-master/index.php/publicknowledge/article/view/5</resource>
        </doi_data>
      </journal_article>

      <journal_article language="fi" xmlns:jats="http://www.ncbi.nlm.nih.gov/JATS1" publication_type="full_text" metadata_distribution_opts="any">
        <titles>
          <title>Puiden geneettinen muutos</title>
        </titles>
        <contributors>
          <person_name contributor_role="author" sequence="first" language="fi">
            <given_name>Diaga</given_name>
            <surname>Diouf</surname>
          </person_name>
        </contributors>
        <publication_date media_type="online">
          <month>09</month>
          <day>25</day>
          <year>2021</year>
        </publication_date>
        <rel:program name="relations" xmlns="https://www.crossref.org/relations.xsd">
           <rel:related_item>
             <rel:description>Finnish translation of an article</rel:description>
             <rel:intra_work_relation relationship-type="isTranslationOf" identifier-type="doi">10.1234/jpkjpk.5</rel:intra_work_relation>
           </rel:related_item>
        </rel:program>
        <doi_data>
          <doi>10.1234/jpkjpk.5.fi</doi>
          <resource>http://localhost/ojs-master/index.php/publicknowledge/article/view/5/fi</resource>
        </doi_data>
      </journal_article>

    </journal>
  </body>
</doi_batch>
fgnievinski commented 2 years ago

we have some journals that publish in two languages. Some of them simply publish the same article as two individual submissions in OJS and this gives them language specific DOIs.

The above might deserve mention in the DIG's guide to Google Scholar Indexing as recommended practice for publishing multilingual articles in OJS:

https://docs.pkp.sfu.ca/google-scholar/en/#adding-multilingual-metadata-in-ojs-32

The justification is that most editors would probably be surprised to learn that secondary language articles in OJS are not indexed externally.

AhemNason commented 2 years ago

Hey everyone!

This is a lot to read on a Monday morning and I do have some thoughts about it. But, I agree with @ajnyga that this is likely a lot more work than it might seem.

Multilingual Metadata

Multilingual metadata is definitely complicated. From a Google Scholar perspective, their recommendation is that metadata should only be recorded in the language the work was written in. If you have a translation of a work, that's an additional record. In their implementation/recommendation, any version of a work will only have one metadata record whose language matches that of the work. They do not recommend adding translations of metadata for works that themselves aren't translated. This is the first hurdle. Multilingual metadata implementations that don't follow these recommendations from Google Scholar may end up unindexed by Google Scholar.

On one hand, I kind of agree with this. Metadata describes the work. Translating a description of a work happens irrespective of the work itself. But, on the other hand, we know that there are use cases for multilingual metadata for journal-, issue-, and article-level use. I've heard users say that an abstract in a language that differs from the work allows them to know whether or not the effort to have the work translated is worth it for them. Fair enough.

The question is where that goes downstream, and how. If Crossref cannot accept multilingual metadata (and, noting, that this is an extremely complicated challenge for the folks working on their schema) at the moment, there will be no way to register that metadata with them in an elegant or useful way. So OJS has a "primary language" option and we share register the primary language metadata with Crossref. Everyone is right that we need a solution here, but I don't think it's the job of OJS/PKP to trailblaze (I should say I have no like... authority in what OJS/PKP does in this space). I think we should follow international standards created by the most-used organizations. If we already know that some indexing (google scholar) isn't yet supporting multi-lingual metadata, we should be cautious with implementation.

Multilingual Works

Now, for translations, we're not talking about the metadata of the original work. We're talking about the metadata of a translation of that original work. This article would have fully different metadata because there is another contributor to the work in that it was translated. A translated work is a new, derivative work (depending on your nation's copyright law, at least).

I actually don't think galleys are the answer here. I think additional "articles" for each translation are. Because the translation metadata could/may change for each article. It's best not to think of your table of contents as hosting multiple versions of each article, but instead listing each discrete work . If you put each translation in its own record, with its own landing page, with a corresponding translated galley, you would be able to get your unique DOI and the indexing you're looking for. You're also representing each work uniquely.

I do think this requires transparency from the journal. Basically, even if we could write to a fully multilingual metadata schema today, there are very few places that could read it/parse it. But if the argument is that each translation is its own work then there are no issues minting a DOI for each version and recording metadata in the language that matches.

fgnievinski commented 2 years ago

pagination is yet another reason for discouraging multilingual galleys in favor of separate works, one for each language. page numbers won't be exactly the same for translations and original work; normally the translation follows sequentially the original work in a given journal issue. so the "how to cite" text will always display the primary language's page numbers, despite the change in article title with requested language. of course, in principle the pagination field could be made internationalized and localized, but that direction doesn't seem fruitful. in view of the guidance from CrossRef giving preferrance to separate DOIs for original works and translations, plus the invisibility of translated metadata and galleys for web indexing, we should consider alerting journal editors/admins about the limitations of multilingual galleys. at least fixing this wouldn't require any major coding, it seems more a matter of documentation.

ajnyga commented 2 years ago

I personally would like to see this solved within the context of a single Submission. We just need to have separate landing pages for translations and Google Scholar plugin for example has to take this into account. GS plugin can not present multilingual metadata on a single landing page, but it can show different kind of metadata when you visit the url of the translated version.

Within the OJS context we would still have one Submission. But from the Crossref and GS point of view we would have two separate Works, the original and the translation. How this is handled in the TOC is of course a matter of opinion.

Why I would not want to go to the direction of separate Submissions in OJS is that I am not sure how we would handle the relations between the Submissions. OJS by default would just see two separate Submissions that have nothing in common so editors would have to define those by hand.

What makes the approach I suggest problematic is versioning. We might end up in a situations where an article has for example three versions and each version has three translations. It could be difficult to maintain this.

fgnievinski commented 2 years ago

just to mention some documentation about CrossRef's hasTranslation and isTranslationOf fields:

Relationships between different research objects - Example: translated article https://www.crossref.org/documentation/content-registration/structural-metadata/relationships/#00047

the above fields would also benefit journals currently publishing multilingual articles in OJS simply as two independent submissions (which is facilitated by the "quick submit" plugin) and who might prefer to continue using this existing approach to obtain language-specific DOIs.

PS: maybe this simpler existing alternative, of independent submissions, should be split in its own issue ticket? I ask because its full support could happen much sooner than the "ideal" solution of a unified multilingual submission. it'd only require the new hasTranslation/isTranslationOf fields (including their export by the crossref-ojs plugin), plus minor documentation improvements.

PPS: cool factoid: "if you register two DOIs ... and then link those DOIs together using the ‘isTranslationOf’ relation ... you will only be charged for one DOI."

AhemNason commented 2 years ago

I'm not sure, but I think writing in relational metadata between two published articles in OJS is actually a non-trivial undertaking. Especially since there's no interface by which to link articles as it stands.

lpanebr commented 2 years ago

Not directly related but, solving it in a single submission is ideal and is the way Scielo and PMC handle it.

More specifically, the use a <sub-article>s[1] for each translation on multilingual articles[2,3].

[1] https://jats.nlm.nih.gov/publishing/tag-library/1.3/element/sub-article.html [2] https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/dobs.html#dob-multi-lang [3] https://scielo.readthedocs.io/projects/scielo-publishing-schema/pt_BR/latest/tagset/elemento-sub-article.html?highlight=sub-article

fgnievinski commented 2 years ago

JATS' "sub-articles" can be abused in ways that render metadata incompatible with CrossRef and downstream citation tools. For example, the tagged sample given in [1] is a super-article with four sub-articles, each completely independent -- different authors, body, floats, references: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC27377 The metadata in CrossRef ends up lumping all the sub-articles' titles together in a messy single line, without punctuation: Controversy in primary care: Should asymptomatic haemochromatosis be treated? Treatment can be onerous for patient and doctor Commentary: False certainty of clinical guidance Commentary: Early treatment is essential https://doi.crossref.org/servlet/query?pid=a@b.c&id=10.1136/bmj.320.7245.1314&format=info

I also note the PubMed Central Tagging Guidelines offer two options for multi-language articles [2]: "Article and Translation", based on sub-articles, and "Publication in Multiple Languages", based on parallel articles; the latter is more aligned with CrossRef's recommended practice:

If an article is simultaneously published in multiple languages where none is identified as a translation, the article in each language should be tagged in an independent <article> with @xml:lang. Each <article> must have a <related-article> element with @related-article-type="alt-language" identifying the article in the other languages.

So either a separate work or a separate landing page for each language (with their own DOI) seems the only way for improving discoverability of multilingual content published in OJS, given the limitations of CrossRef, GScholar, etc.

PS: I'd suggest keeping the discussion about multilingual metadata of single-language works separate in issue #7272.

fgnievinski commented 2 years ago

related issue:

"Fix submission language/languages metadata" https://github.com/pkp/pkp-lib/issues/5000

fgnievinski commented 2 years ago

Meanwhile, I've submitted a documentation patch describing Crossref's recommendations on multilingual articles to section "Adding multilingual metadata in OJS" of the "best practices for indexing" document: https://github.com/pkp/pkp-docs/pull/907

If you publish multilingual content and use Digital Object Identifiers (DOI) issued by Crossref, please note their recommendation: “When a single journal article is published in two languages, each should be assigned its own DOI.” Currently, OJS supports DOI-compatible multilingual content only if each monolingual version is published as a separate work. If the manuscript submission involved multiple languages originally, the QuickSubmit plugin can assist in splitting the various single-language works.

diegoabadan commented 1 month ago

Now we have a plugin to help with this process: https://github.com/lepidus/DoiForTranslation

I notified the community in a related discussion on the forum.

We will soon have it in the gallery.

Thanks to REGEPE - Entrepreneurship and Small Business Journal for funding the development by Lepidus.

lpanebr commented 1 month ago

Now we have a plugin to help with this process: https://github.com/lepidus/DoiForTranslation

Looks very good! Congrats!