pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 442 forks source link

Add metadata for article number #4695

Open fgnievinski opened 5 years ago

fgnievinski commented 5 years ago

Describe the problem you would like to solve Many journals assign each article an article number that is used in citations and other metadata -- often instead of page numbers. This data is not captured in OJS and, as a result, often journals try to put the article number into the pages field. This inaccurate metadata is then distributed.

Describe the solution you'd like Add a new metadata field for article number and send it to Crossref and any other indexing services which support it. The following steps are required to implement this:

Who is asking for this feature? Journals in disciplines that use article numbers as part of the citation.

Additional information The comments below questioned the necessity of this metadata, but sufficient evidence was found for its wide use (see comment, the original comment below, and other comments).

Crossref has a separate placeholder for article number:

<publisher_item>
   <item_number item_number_type="article_number">e12345</item_number>
</publisher_item>

The original proposal can be found below:

Publishers such as APS, AGU, eLife, Scielo have adopted article numbers. Existing OJS deployments are using the metadata field for page number to store it. Crossref recently (2017/2018) created a separate placeholder for that information:

<publisher_item>
   <item_number item_number_type="article_number">e12345</item_number>
</publisher_item>

Ideally OJS would offer a separate field for article numbers and the Crossref plugin would use the proper XML item to export it.

This issue has been discussed in the forum:

https://forum.pkp.sfu.ca/t/ojs-3-using-article-numbers-instead-of-page-numbers/47853

asmecher commented 5 years ago

This has a lot of overlap with the Public URL Identifier. I wonder whether broadening that field into an article number would hit this requirement as well, without introducing a new/competing field?

Tagging @AhemNason, as rep for the metadata working group :)

AhemNason commented 4 years ago

Hey @asmecher, sorry for missing this. My email notifications were out of control and I missed a heap of these pings for feedback here.

I'll say that the Public URL identifier is definitely something users misunderstand a lot. I've seen a surprising amount of folks put DOI suffixes there. I think the term "identifier" is problematic because people conflate it's use with a PID.

I don't have strong feelings either way for an article number. I see many users ascribe article numbers to their DOI suffixes but for the purposes of a DOI, I think it's meaningless. I usually find that it's an attempt to make a URL or a DOI human-readable and I don't think that's necessary. That said, if Scielo are adopting it and there's a Crossref element for it, it certainly couldn't hurt and would likely be an improvement over the public URL field.

I guess the other take is that we already kind of have these, and they're in the DOIs to begin with. An article ID in OJS is basically the same thing and we can guarantee that they're unique within an installation. I'll forward this to the metadata working group to see if anyone has strong opinions.

fgnievinski commented 4 years ago

Nature is also depositing article numbers in CrossRef, e.g.:

<item_number item_number_type="article-number">15181</item_number>

For example:

https://doi.org/10.1038/s41598-019-51802-9

http://doi.crossref.org/servlet/query?pid=fgnievinski@gmail.com&id=10.1038/s41598-019-51802-9&format=info

AhemNason commented 4 years ago

Yeah, there's an article number identifier option. Larger publishers use them. If we were to use them, I think having them match the article ID in the system would be the easiest way to go. If we add another identifier on top of that internally, it's going to be a mess.

Just a note on additional identifiers coming down the pike from Crossref. https://docs.google.com/document/d/1ey8uAafHy4FB-bu-tpZpUVdRM19gR3u90xklZ-H0LXk/edit#heading=h.ql27ykv3w2fx

NateWr commented 4 years ago

Can I clarify: is the issue here just the need to pass the article ID to Crossref? So we can use the existing submission ID, we just need to add it to the XML deposit that's created.

I don't think there's any real need here for a separate input field, unless journal editors have their own article numbers that they want deposited. If that's the case, I think we should really consider whether this creates a positive metadata impact or whether it constitutes an unnecessary administrative burden.

fgnievinski commented 4 years ago

Article numbers are replacing page numbers entirely in many online-online journals. So missing article numbers in the metadata crossref deposit is causing incomplete citations. This is specially significant when users rely on reference managers to pull the metadata automatically for a given DOI.

Reusing the article ID in the system would be fine if it remains constant after a site backup and restore operation.

NateWr commented 4 years ago

Reusing the article ID in the system would be fine if it remains constant after a site backup and restore operation.

Yes, the submission ID would remain constant in a backup/restore situation. Where it wouldn't remain constant is in a situation where submissions are imported into a site. This is often done during migrations from one platform to another, or when a journal from one OJS instance is migrated into another.

In such circumstances, it may make more sense to save a UID when a submission is published into the publication_settings table, and to never surface it outside of metadata deposits.

This sounds like it would be solid plugin territory to me. The submission/publication data model is easily extensible from v3.2 on. And if the Crossref deposit output is not yet extensible, this would be a good plugin to pioneer making it possible/easier to do.

asmecher commented 4 years ago

Where [the article ID] wouldn't remain constant is in a situation where submissions are imported into a site.

Just a quick note that the import XML does have placeholder support for using specified article IDs. See https://github.com/pkp/pkp-lib/issues/5132 for documentation on the use case, but just briefly, generally it's intended for updating existing submissions.

bdmckay commented 4 years ago

Having read the discussion so far, I want to comment on some things that I don't think are clear enough.

The permanent bibliographic coordinates of an article might look like one of these:

Style 1. Volume 3, Issue 2, pages 10-20. Style 2. Volume 3, Issue 2, article 4.

Style 1 is the traditional style inherited from printed volumes, but Style 2 makes more sense for electronic-only journals and is becoming more common. My journal has been using Style 2 since 1994. fgnievinski gave another example above. (Style 2 is incidentally also less work for editors.)

In order to use Style 2, editors need to be able to choose the article number and it needs to be displayed to readers.

Currently there is no place to put the article number (which might not be numeric) so since we joined OJS at the 2.3.6 stage we have put it in the page number field. That gave us no trouble at all until we started Crossref deposits (a problem we hacked around).

Using the submission-id as the article number is a non-starter since (a) editors want to choose what the article number is, (b) the submission-id of published articles isn't contiguous or monotonic, (c) the bibliographic coordinates of already published articles must not be changed.

I don't think the Public URL Identifier should be used either, since (a) tons of links around the www will break if the PUI is changed, (b) the syntax of existing PUIs might not be suitable for article numbers, (c) journals like mine already have lots of articles for which the PUI and article number are different and we can't change either.

I think it would be great if a new metadata field for article number (character string type) was created. A stop-gap measure that is essentially what we have done in our journal would be to have an option in the DOI plugin for whether the page number is exported as a page number or an article number. But it isn't an ideal long-term solution.

Cheers, Brendan.

NateWr commented 4 years ago

Thank you, @bdmckay, for your clear explanation of why re-using existing numbers isn't satisfactory. I'd still like to see this question addressed, which I raised a while back:

I think we should really consider whether this creates a positive metadata impact or whether it constitutes an unnecessary administrative burden.

What makes the following bibliographic reference:

Volume 3, Issue 2, article 4

More valuable than:

Volume 3, Issue 2, article title

In other words, what is the article number adding to the discovery and identification of articles?

cc @AhemNason on this.

NateWr commented 4 years ago

Also, @bdmckay, I should mention that in v3.2 we separated the Public URL Identifier and the Publisher ID. It is now possible to add a value to the Publisher ID without impacting the URL. This may or may not serve your needs, but it's worth mentioning.

AhemNason commented 4 years ago

I'd like to do a little bit of research on article numbers before I put in a more fully-formed response but I would lead just with this:

A stop-gap measure that is essentially what we have done in our journal would be to have an option in the DOI plugin for whether the page number is exported as a page number or an article number. But it isn't an ideal long-term solution.

Please no stop-gap measures that modify DOI/Crossref metadata. This is a very bad idea that will get complicated both for Crossref members and our ability to support them. If we're going to do this, we need to do it right and without appropriating existing functions. One of the biggest issues we have with the quality of OJS metadata is when someone wants something for a visual, stylistic, or organizational reason and they "hack" or "work around" metadata by putting things where they weren't intended. The downstream problems this causes can be bad for discoverability and bad for citations. Let's avoid that as much as possible.

I'll review a few pieces. My personal take is that an article number only serves to indicate where an article would have been placed in an issue in a table of contents. I'm dubious to the utility of this for readers. Journals aren't typically required to be read sequentially. I don't think it's at all useful for citations in electronic journals, especially if they're using DOIs. URIs and DOIs aren't meant to be human readable.

I do, though, believe lots of publishers use article numbers or some sort of local identifier for the purposes of their own record keeping and organization.

AhemNason commented 4 years ago

So, the Crossref schema (which I usually turn to because of how many different types of publishers and publications they accommodate) includes two metadata fields for article identifiers.

https://data.crossref.org/reports/help/schema_doc/4.4.2/schema_4_4_2.html#identifier

A public standard identifier that can be used to uniquely identify the entity being registered. This identifier is a publisher-assigned number that uniquely identifies the entity being registered. This element should be used for identifiers based on public standards. Use item_number for a publisher identifier that is based on a publisher's internal systems rather than on a public standard. The supported standards are: PII - Publisher Item Identifier SICI - Serial Item and Contribution Identifier DOI - Digital Object Identifier

So this field's use is very specific to standards-based identifiers. PII, in this case, is: "Publisher Item Identifier, a unique identifier used by a number of scientific journal publishers to identify documents. It uses the pre-existing ISSN or ISBN of the publication in question, and adds a character for source publication type, an item number, and a check digit." My feeling is that most of our journals are unlikely to use these.

But there's also this:

https://data.crossref.org/reports/help/schema_doc/4.4.2/schema_4_4_2.html#item_number

A publisher identifier that can be used to uniquely identify the entity being registered. This identifier is a publisher-assigned number that uniquely identifies the entity being registered. This element should be used for identifiers based on publisher internal standards. Use identifier for a publisher identifier that is based on a public standard such as PII or SICI. If the item_number and identifier are identical, there is no need to submit both. In this case, the preferred element to use is identifier. Data may be alpha, numeric or a combination. item_number has an optional attribute, item_number_type. It is assigned by the publisher to provide context for the data in item_number. If item_number contains only a publisher's tracking number, this attribute need not be supplied. If the item_number contains other data, this attribute can be used to define the content. For example, if a journal is published online (i.e. it has no page numbers), and each article on the table of contents is assigned a sequential number, this article number can be placed in item_number, and the item_number_type attribute can be set to "article_number". Although Crossref has not provided a set of enumerated types for this attribute, please check with Crossref before using this attribute to determine if a standard attribute has already been defined for your specific needs. If a dissertation DAI has been assigned, it should be deposited in the identifier element with the id_type attribute set to "dai". If an institution has its own numbering system, it should be deposited in item_number, and the item_number_type should be set to "institution" If the report number of an item follows Z39.23, the number should be deposited in the identifier element with the id_type attribute set to "Z39.23". If a report number uses its own numbering system, it should be deposited in the identifier element, and the id_type should be set to "report-number" The designation for a standard should be placed inside the identifier element with the id_type attribute set to "ISO-std-ref" or "std-designation" (more generic label)

So, these are different than public IDs. I still don't feel as though our use of that field is prescribed or well-defined. Lots of folks use it in a variety of ways and almost all of them seem to be ad hoc.

Some other considerations:

`

1037 10.1128/JCM.39.7.2634-2636.2001 11427581 ... Molecular Identification of a Dietzia maris Hip Prosthesis Infection Isolate

... `


So basically, "publisher ID" as distinct from a system-based article/submission ID is supported by JATS and Crossref. Its use is specifically internal to journals/editorial/publisher. I think something like this is worth supporting.

NateWr commented 4 years ago

Thanks @AhemNason. I think that we already support a publisher ID for internal use. We call it the Publisher ID and it's available in the identifiers area (it used to be the URL path but we've separated them out in 3.2).

It doesn't seem, from what you've described, that there is any particular support for an article number as in the number of the item in an issue or volume. Is that right?

AhemNason commented 4 years ago

No the Crossref schema does have that.

For example, if a journal is published online (i.e. it has no page numbers), and each article on the table of contents is assigned a sequential number, this article number can be placed in item_number, and the item_number_type attribute can be set to "article_number".

fgnievinski commented 4 years ago

many citation styles, such as that of Science, do not include the article title in the list of references.

article number is an alternative to page numbers – no more, no less useful than that.

yes, doi makes both unnecessary, but it also makes volume number and issue number unnecessary.

finally, crossref already fully supports article numbers, which means recognition of the role and importance of this metadata:

https://support.crossref.org/hc/en-us/articles/115000434843

_Journal articles and other scholarly works often have an ID such as an article number, eLocator, or e-location ID instead of a page number. In these cases, do not use the  tag to capture the ID - instead, use the  tag with the item_number_type attribute value set to “articlenumber”…

AhemNason commented 4 years ago

Just for context, the reason we so strongly consider the additions of new metadata fields in OJS is because we know from UI/UX that a lot of the fields aren't used properly or can be confusing to use. So, the assertion here isn't that you don't need the field, but that the addition of the field needs to be carefully considered so that its use is clear-cut.

bdmckay commented 4 years ago

@NateWr (I started typing to your "Thank you..." question above).

Please note that you could ask the same question about (volume, issue, pagenumber). The title is not suitable as part of the primary article coordinates because it is long, may contain foreign characters, mathematical symbols, etc.. In practice it would be used with some pattern matching; i.e. it would not be precise at all.

As I wrote earlier, we absolutely must be able to support (volume, issue, article number) for our thousands of published articles since changing the coordinates of published articles is strictly forbidden. But more than that I honestly believe (volume, issue, article number) is a natural coordinate system for electronic-only journals and that issue-wide page numbers are an artefact of printing that in time we will discard.

We discussed this with Crossref earlier this year and were told to submit our article numbers like this: P2.20 . To quote: "You deposited the article number metadata correctly, according to our best practice. Please do continue in that manner. We are aware that the BibTeX output (and therefore the formatted citations) produced by the metadata search do not include article IDs/item numbers. A couple of other members have reported to us that they believe these should be treated like page numbers for the purpose of formatted citations. So, our developers are aware and that's on the list of things they're considering for future improvements...".

Please see fgnievinski's postings above for examples of journals who use article numbers for public navigation (not just internally). Another example (one of the most important in physics) is J. Physics A. A typical citation is "J. Phys. A: Math. Theor. 52 494001" where 52 is the volume number and 494001 is the article number. My journal is https://www.combinatorics.org . Typical citation: "Electron J. Combin. 27 (2020) P2.16".

Re "we separated the Public URL Identifier and the Publisher ID", I read some of the discussion but was confused. Where can I read a precise description of what these mean and how they are used in the system?

Many thanks, Brendan.

NateWr commented 4 years ago

Just catching up and may not be able to reply today but wanted to pass @bdmckay this link which describes the changes to Publisher ID and Public URL Identifier in 3.2: https://github.com/pkp/pkp-lib/issues/5430#issuecomment-587093740

bdmckay commented 4 years ago

@NateWr Thanks for the link. We use a public URL Identifier that encodes (volume,issue,article). Such as v3i2a7 for vol 3, iss 2, article A7. This incidentally allowed us to install server rewriting rules that understand our old URLs from before we adopted OJS. Even URLs that were printed in the 1990s still work.

Here are some more journals that use public article numbers in place of page numbers.

Some examples have sequential numbering within issues, some have more random-looking article numbers. Also, many "numbers" are not numeric; please allow strings.

Cheers, Brendan.

NateWr commented 4 years ago

Thanks everyone for the discussion. Whatever the merits of an article number, it's clear that this is widely used in some disciplines.

I think that we can support article number in the core application. But like other metadata that is not commonly used (Type, Coverage) it should be added as an opt-in at the journal level, so that it is not presented unless a journal editor enables it. This should be pretty easy to do for 3.2.x and above, because of the new schema format. The needed steps would be:

@bdmckay and @fgnievinski do either of your institutions have developer resources that could implement this? I'd be happy to provide guidance on where and what to change.

(I'm using publicationNumber instead of articleNumber so that this can be re-used in OPS/OMP, in the event that a use-case arises. But of course publicly it would be described as an "Article Number" in OJS.)

sbirAECA commented 3 years ago

Good morning Nate. Could it be possible that the two systems, the publicationNumber and the page-range, worked simultaneously? Imagine a journal that initially started operating with page-range, and then the publishers decide to switch to publicationNumber. There could be a method to detect that only one of the two variables has content. Depending on each article, the citation should be different, , also the crossref exportation, ... Best regards, Carlos

AhemNason commented 3 years ago

Is there precedent for article ID in a citation? I don't understand how that's necessary given all the identifying information is in the citation itself. I thought article IDs were self-referential and specifically used within publishing systems.

AhemNason commented 3 years ago

Just adding this here from Crossref's schema regarding "item number" and it's use/scope.

https://data.crossref.org/reports/help/schema_doc/4.4.2/schema_4_4_2.html#item_number

A publisher identifier that can be used to uniquely identify the entity being registered. This identifier is a publisher-assigned number that uniquely identifies the entity being registered. This element should be used for identifiers based on publisher internal standards. Use identifier for a publisher identifier that is based on a public standard such as PII or SICI. If the item_number and identifier are identical, there is no need to submit both. In this case, the preferred element to use is identifier. Data may be alpha, numeric or a combination. item_number has an optional attribute, item_number_type. It is assigned by the publisher to provide context for the data in item_number. If item_number contains only a publisher's tracking number, this attribute need not be supplied. If the item_number contains other data, this attribute can be used to define the content. For example, if a journal is published online (i.e. it has no page numbers), and each article on the table of contents is assigned a sequential number, this article number can be placed in item_number, and the item_number_type attribute can be set to "article_number". Although Crossref has not provided a set of enumerated types for this attribute, please check with Crossref before using this attribute to determine if a standard attribute has already been defined for your specific needs. If a dissertation DAI has been assigned, it should be deposited in the identifier element with the id_type attribute set to "dai". If an institution has its own numbering system, it should be deposited in item_number, and the item_number_type should be set to "institution" If the report number of an item follows Z39.23, the number should be deposited in the identifier element with the id_type attribute set to "Z39.23". If a report number uses its own numbering system, it should be deposited in the identifier element, and the id_type should be set to "report-number" The designation for a standard should be placed inside the identifier element with the id_type attribute set to "ISO-std-ref" or "std-designation" (more generic label)

NateWr commented 3 years ago

Is there precedent for article ID in a citation?

@AhemNason I think that @fgnievinski and @bdmckay have provided sufficient evidence that this is used frequently enough in some disciplines. See https://github.com/pkp/pkp-lib/issues/4695#issuecomment-582370489, https://github.com/pkp/pkp-lib/issues/4695#issuecomment-628652689 and https://github.com/pkp/pkp-lib/issues/4695#issuecomment-628663831. My feeling on this from above is: "Whatever the merits of an article number, it's clear that this is widely used in some disciplines."

Could it be possible that the two systems, the publicationNumber and the page-range, worked simultaneously?

Any implementation in OJS will require us to support both -- even to support both at the same time. Do we know if Crossref's schema defines any restrictions here? For example, do they forbid a deposit with both article id and page number?

sbirAECA commented 3 years ago

Is there precedent for article ID in a citation?

The new APA7 style in this link. they introduce the label 'Article' followed by the article number. The IEEE style in this link page 13, 'Periodical With Article ID' they introduce the label 'Art. no' followed by the article number.

Even citeproc has given support, in the attached file, data_elocator.txt (is a json file) data_elocator.txt, the difference between "id": "ITEM-3" and "id": "ITEM-4" is that it is switched "page": "159-181" by "number": "e298". If you test them with $style = StyleSheet::loadStyleSheet("ieee"); and $style = StyleSheet::loadStyleSheet("apa"); the results are:

IEEE [3] S. J. Cole and R. Moore, “Hydrological modelling using raingauge- and radar-based estimators of areal rainfall”, Journal of Hydrology, vol. 358, no. 3-4, pp. 159-181, 2008, doi: 10.1016/j.jhydrol.2008.05.025. [4] S. J. Cole and R. Moore, “Hydrological modelling using raingauge- and radar-based estimators of areal rainfall”, Journal of Hydrology, vol. 358, no. 3-4, Art. no. e298, 2008, doi: 10.1016/j.jhydrol.2008.05.025. APA7 Cole, S. J., & Moore, R. (2008). Hydrological modelling using raingauge- and radar-based estimators of areal rainfall. Journal of Hydrology, 358(3-4), 159-181. https://doi.org/10.1016/j.jhydrol.2008.05.025 Cole, S. J., & Moore, R. (2008). Hydrological modelling using raingauge- and radar-based estimators of areal rainfall. Journal of Hydrology, 358(3-4), Article e298. https://doi.org/10.1016/j.jhydrol.2008.05.025

Crossref sent me an email in the same sense as the one they sent to @bdmckay, in case it is page-range, it can't be article number and vice versa

fgnievinski commented 2 years ago

Thanks @NateWr for the neat list of steps needed: https://github.com/pkp/pkp-lib/issues/4695#issuecomment-630113888

While I don't have direct access to developer resources at my institution, I've taken the liberty of advertising this enhancement suggestion in some user groups. Hopefully rising awareness about the issue and the solution envisioned would bring us closer to finding someone able to assist.

About the term "publication number" as a generalization of "article number", I'm afraid it could get confused with CrossRef's "publication ID", which actually identifies a whole journal or book:

https://www.crossref.org/documentation/content-registration/administrative-metadata/article-ids/#00033

So maybe "document number" or "work number" would be a better alternative.

NateWr commented 2 years ago

:+1: I think we can use workNumber internally in the code, but use "Article Number" in user-facing UIs. It will be localized across all applications anyway. It's just the internal property we need to keep consistent across all applications.

agmarrugo commented 2 years ago

Does anyone know if this feature will make it to OJS at some time in a future release? I know of many OJS journals placing the "article number" on the "pages" field, which Google Scholar then indexes erroneously as the same value in "citation_firstpage" and "citation_lastpage".

jyhein commented 1 year ago

OJS: https://github.com/pkp/ojs/pull/3802 PKP: https://github.com/pkp/pkp-lib/pull/8711

Added (+) and not yet (-):

[+] Add a workNumber prop to the context schema which allows it to be enabled/disabled. [+] Add a workNumber field to the MetadataForm when it is enabled. [-] Add this number to Crossref exports when it is enabled. [-] Come up with clear guidance about when it should be used, such as that shown with publisher id. [-] Show the workNumber in citations if it exists. [+] Show the workNumber in the default theme if it exists.

I am working on this and I have done above (+). What do you think? @asmecher

asmecher commented 1 year ago

Sorry for the wait in looking at this, @jyhein!

@bozana, would you be able to have a look over this? It would be for inclusion in 3.5 rather than 3.4.

fgnievinski commented 1 year ago

[-] Show the workNumber in citations if it exists.

Since CSL is supported in OJS3 #723, I think it's safe to assume the step quoted above could be mostly left up to each citation style to support (or not); e.g., APA 7th already does:

https://forums.zotero.org/discussion/11177/crossref-publisher-item-article-number

Based on Zotero's implementation, the field "first page" should be empty and the article number should be provided in the "extra" field (with content prefixed by "Number: ").