pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 442 forks source link

OJS should support publications authored by organizations and anonymous authors #5955

Open AhemNason opened 4 years ago

AhemNason commented 4 years ago

This issue describes two problems -- a bug and a feature request.

1. The feature request is that OJS should support publications without a specific author. This leads to bad metadata and downstream services, like Crossref, support author-less objects. 2. The section option "omit authors from table of contents" is not respected and the default theme shows authors anyway. This should be addressed in line with how we handle author-less publications.

(Authorless comments are filed/resolved in https://github.com/pkp/pkp-lib/issues/8403.)

Original comment:

In journal section metadata/options, there's a qualifier to omit the author's name from a journal section. This is, by and large, a workaround for journals publishing articles that don't have an "author". Stuff like errata or abstracts collections or any other number of things. These are usually uploaded by managing editors or other journal staff who are also authors of legitimate materials.

All themes, including default, display these author names on the table of contents even if the radial to not display them is selected.

You can find the relevant setting in Journal>Sections>Edit>Omit Author from Journal section...

I'd add that this should also omit the author from citations, the article landing page, and any other associated metadata. This whole feature is a workaround due to not being able to zero out an author field, but the downstream impact of that stored metadata is also potentially problematic.

Author email addresses should no longer be required. (https://github.com/pkp/pkp-lib/issues/3561)

asmecher commented 4 years ago

@AhemNason, what's your take on authorless submissions in general from a librarian/metadata quality perspective? I'd worry about introducing the potential for downstream breakage with other formats that will require creator metadata.

I don't think "forcing" a single author by e.g. preventing someone from deleting the last author record would be a good UI/UX outcome, so if we deem at least one author to be a requirement, then perhaps make it a pre-publication check so that the submission can't be published unless it meets that condition?

AhemNason commented 4 years ago

Alec, I've been asking for authorless submissions since 2008. It's actually the case that more things downstream accommodate "authorless" than not. Requiring an author leads to way more issues with bad metadata.

For example, if I'm a journal manager named "jim john" and I've been uploading an item to my issues called "contributor" for 10 years, and I have ORCID, I'm sucking in ten years worth of "articles" that I don't want. Not everything in a journal has an author. Errata... for example. We have a journal locally that includes a collection of abstracts from their annual conference in every issue. That collection isn't written by a single author, nor is listing 25 authors sensible.

Crossref allows authorless content. JATS, MODS. Instead of an empty element, there just isn't an author element at all.

Even single author requirement is an overstep. Unless you make it section based. I think having section-based options here like you do for abstracts (for both author and email) would really help a great deal with trash metadata in OJS.

A user should never be required to enter metadata that does not exist.

AhemNason commented 4 years ago

Just a note here too that the "omit section title from the table of contents" is also broken on the default theme.

asmecher commented 4 years ago

Sorry, I could've phrased that in a way that poked fewer bears. Also the author email requirement, when a record is appropriate (https://github.com/pkp/pkp-lib/issues/3561). Actually I think we're well situated to remove these requirements, having gradually moved away from conflating the first author with the corresponding user.

Crossref allows authorless content. JATS, MODS. Instead of an empty element, there just isn't an author element at all.

OK, good, likewise DC. Off the top of your head, do you know about METS, ORCID, Google Scholar?

AhemNason commented 4 years ago

I can look into it but I'd suggest ORCID won't matter if there's no author. Google Scholar might. I can look into those two. And get back to you.

AhemNason commented 4 years ago

So, Google scholar is reporting that it can't index something without an author. But almost always these things wouldn't really benefit from indexing anyway. They are almost always not research articles but other things... like errata or other smaller pieces. But, more annoyingly, they are also reporting that:

If there are widespread cases of papers without authors listed, the indexing system will eventually drop the site.

and...

3 articles in an issue of 10 articles that don't have authors would definitely be a red flag for the indexing system.

For what it's worth, I don't think this is reasonable. I'm not confident that we have numbers on the prevalence of these things. It's probably not as high as 3 articles per issue, but it's hard to say. Either way, bad practice is bad practice! Here's the MODs recommendation on Names and it's more or less where I'd sit:

https://www.loc.gov/standards/mods/userguide/name.html

It's not sexy reading, but the gist is this:

The DLF/Aquifer Implementation Guidelines for Shareable MODS Records recommend the use of at least one element to describe the creator of the intellectual content of the resource, if available.

A name may be linked to a uniform title in the record using the nameTitleGroup attribute. A name may be designated as the citation or "main" entry name using the usage attribute.

and also...

Aggregator information: Aggregators commonly use the field as a target for author or subject searching. Even the simplest interfaces offer an author/creator search. In cases of unknown or anonymous creators of resources, aggregators generally remove values indicating this and rely on institutions' local records to convey this information if necessary.

We don't currently support a type attribute or drop-down for names. If something like the editors is the ask for a fake name, it needs to not be coded as a person. For MODS requirements, name is "Recommended if applicable" but not required.

MODS is based on AACR2 Cataloging Standards so... this means that the vast majority of libraries respect an empty author field. And it's not great cataloging practice to make descriptive metadata where it doesn't exist (although, in the interest of the moment, historically they've done some wild things like recommended assuming gender when it wasn't obvious and that sucks).

NateWr commented 4 years ago

Apologies for hijacking your issue, @AhemNason, but I've renamed it to better describe the need for author-less publications as you've described.

I think there's a valid case here for publications without a contributor as OJS understands it now. A couple of questions:

  1. In such cases, shouldn't the journal be described as the contributor/author? This could be done with support for corporate names or by supporting author-less publications but automatically assigning them to the journal under-the-hood.

  2. Asking as an outsider to this: should these things be published as articles in OJS? Do we need a different model to describe errata, for instance, than just calling it an author-less publication? Does JATS/Crossref really want this kind of content described as an article?

AhemNason commented 4 years ago

In such cases, shouldn't the journal be described as the contributor/author? This could be done with support for corporate names or by supporting author-less publications but automatically assigning them to the journal under-the-hood.

  1. This is what Google Scholar would prefer. It's still making up metadata. It's placeholders for empty fields. Google recommended putting in "the editors", but it's worth asking if you want your citation generation for these saying "the editors" in it. Or, if you actually would cite these at all. I bet if you look in Google Scholar under author "the editors" you'll see a wild amount of stuff.

https://scholar.google.com/scholar?as_q=&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22The+Editors%22&as_publication=&as_ylo=&as_yhi=&hl=en&as_sdt=0%2C5

Screenshot 2020-06-30 12 59 21

This is problematic. "T. Editors" isn't any one editor. It doesn't tell me anything useful. Especially if I'm not recording the editors for every issue in my issue-level metadata (which we also do not do). A placeholder in this section is symbolic. Symbolic metadata is a bad idea. It also muddies the waters far more than just not having an author attributed because now it matches a million other made-up authors.

Asking as an outsider to this: should these things be published as articles in OJS? Do we need a different model to describe errata, for instance, than just calling it an author-less publication? Does JATS/Crossref really want this kind of content described as an article?

  1. JATS and Crossref both support authorless in their schema. "Want" isn't really much of an issue when you have over 100 million records. Crossref in particular probably just hopes that user-generated metadata gets better broadly.

If you talk to someone in cataloguing, it's clear that metadata generally shouldn't be assumed, it should be specific. And it should describe what's in the file. For example... if I were to receive an article to upload into OJS and the PDF didn't include author information, I wouldn't go looking for it and have it only appear in the metadata. You want to describe the article/galley as best as you can. You don't want to create fake information out of necessity to get through a submission form.

I'm cautious of the can of worms we open if we decide to not call the publications in OJS "articles". It's true that not all things in OJS are articles. If we open the can of worms, we'd be also opening it to having specific metadata fields for book reviews (everyone mangles book reviews in OJS metadata... it's astonishing). Forms per content type. I'm not against this. I think it's probably better than all content having the same form.

Either way, an empty author field for content that has no specific contributor is more accurate than a placeholder.

I do appreciate this being blown out into its own issue but the original bug remains that the section-by-section settings for author or journal section display on the TOC are still broken.

AhemNason commented 4 years ago

I do very much think that this doesn't have to be a journal wide option for all publications in a journal. But it should be ok to have an empty author field via section-by-section settings.

NateWr commented 4 years ago

I do appreciate this being blown out into its own issue but the original bug remains that the section-by-section settings for author or journal section display on the TOC are still broken.

I've kept that bug listed in the original issue description. My interest here is to resolve that in line with our metadata recommendation. If we decide that author-less submissions is the way to go, we should remove the option rather than fix it (for example).

I'm cautious of the can of worms we open if we decide to not call the publications in OJS "articles".

I wasn't clear enough in my original question. I'm not proposing that we don't call all publications in OJS articles. My question really comes back to what you said here:

it's worth asking if you want your citation generation for these saying "the editors" in it. Or, if you actually would cite these at all.

If we're talking about material that is fundamentally different from articles in OJS, should we be treating them as articles in OJS? Some things, like editorials, probably should have an attribution -- though that should not be T Editors... maybe the journal name itself, put into a corporate name field? And they probably should at least have the option of going through a form of peer review / editorial sign-off (ie - the submission workflow).

Maybe other things, like errata, shouldn't be published as articles, deposited into Crossref as articles, be given a suggested citation, or be put through the submission workflow? Maybe this content shouldn't be described and treated by OJS as an "article" at all?

Does OJS need an alternate content type for things that aren't a "research output"?

AhemNason commented 4 years ago

Hey Nate, I've been thinking about this a lot and I think it's worth pulling in some answers from hosting services. My gut feeling is yes, but my other gut feeling is that it's a great way to complicate a process that already feels too complex for some. I'd like to get some feedback from the hosting support folks. Chiefly, @jmacgreg @amandastevens and @mfelczak . James is on vacation until the 13th but I've emailed it to him to follow up on.

eocarragain commented 3 years ago

I can confirm what @AhemNason says about the "Omit author names for section items from issues' table of contents." section flag being ignored in default and manuscript themes in OJS 3.2.1-1.

@asmecher @NateWr , should this bug be broken out into its own issue - getting this fixed would help hide the bad author metadata at least in the UI.

On the more general issue, I fully agree with @AhemNason : it should not be a requirement to include authors, but should be configurable by journal managers on a section-by-section basis (similar to abstracts etc.).

Thanks Eoghan

NateWr commented 3 years ago

should this bug be broken out into its own issue

Yes, thanks, can you file that for us @eocarragain?

CreationTribe commented 3 years ago

I'm assuming this is either a work in progress or the issue simply hasn't been closed yet since the last comment was in Nov of 2020.

Either way, I'm extremely interested in this topic. I'm in the midst of setting up the online presence for a scientific journal with a bit of a twist. A call for papers will be put up as soon as everything is working to the journal's satisfaction; but the concern is that there will likely be a large number of publications for which the authors and researchers may want to remain anonymous for fear of accidentally damaging their academic reputation (which is completely understandable).

Now, I know that anonymity and authorless are not technically the same thing; however, as I was searching for how to submit anonymous publications with OJS3, I came across this thread.

I'm new to the technical aspects of OJS and there have been a few parts in the comments above where the contextual use of different terms kind of lost me (though, I've got the basic gist of it). So I'm not 100% sure we're talking about the same thing.

Is this something I need to be concerned about? Our editors and reviewers will need access to author information for quality control reasons; but it's the general assumption that the desire to submit anonymous publications may be much higher for our particular platform than for the traditional scientific journal.

Am I barking up the wrong tree, or is this the tree I should keep an eye on?

asmecher commented 3 years ago

@CreationTribe, that's interesting, but also a bit of a niche (it would be hard for the dev team to prioritize), and it is distinct from authorless submissions. Simply hiding the authorship in the reader front end would be relatively easy, but you'll need to consider all the downstream ingests of metadata (the OAI-PMH interface, Google Scholar, CrossRef, etc) -- will each of these accept authorless submissions, or do they have another mechanism for anonymous authors, or do you actually want to provide the real data to these tools?

CreationTribe commented 3 years ago

@asmecher That's kind of the trick, isn't it? All the same, I think we'll try to start with OJS and just see how many author's actually want the safety of anonymity. We've thought about WP - but it's not really built for publication - even with all of the plugins, it feels gimmicky. Granted, we could probably take a poll. If we developed the feature, can we submit a pull request? Would that be something you guys would entertain?

asmecher commented 3 years ago

@CreationTribe, pull requests are always welcome! You've proposed two approaches -- authorless publications and "anonymous" authors -- and of the two, authorless publications (as described in this issue) already has community interest. If it would suit your needs, it would be the best investment of effort, I think.

pmangahis commented 3 years ago

+1 from PKP Hosting

AhemNason commented 3 years ago

Just a note to @CreationTribe and @asmecher that I'm pulling together some more well-formed thoughts about best practice for name metadata here. Shouldn't be much longer.

AhemNason commented 3 years ago

Hello! So, my general recommendation here is for "contributor types" that cover the number of types of authors we might see. Those would be broadly described as:

I wrote a little report about it here where you can see how these situations are handled in JATS and Crossref respectively.

https://www.notion.so/ahemnason/Proposing-name-types-as-an-alternative-to-abusable-name-fields-in-OJS-d006df44f92e4571ad32701cb66dfdab

NateWr commented 3 years ago

Thanks for the helpful document, @AhemNason. It looks like downstream consumers can handle this fine. The only hiccup is a publication with no author for Google Scholar. The three use cases you provide for no authors ("Announcement" or "Addendum" or "Errata") probably shouldn't be included as publications in Google Scholar anyway, so I think we're ok there.

twakeford commented 3 years ago

A point on the above is that I would argue that Errata/corrigenda (or Corrections in simple terms) should have an author - they are either the author or the editor/journal formally_ correcting the publication record, so this needs to have clear communication on who is issuing such a notice. Corrections to the publication record are a serious business so the formality and authority of who is issuing a correction is an important bit of data for the reader to know. I think I'm right that COPE recommends that the author list is kept exactly the same as the original publication if errata/corrigenda are published. If the authors completely object then the editor/journal can be listed instead.

Equally, it is very important that these types of articles are picked up by indexes such as Google Scholar, otherwise the chance of researchers finding only the original publication and not the corrected material is higher and thus the risk of citing erroneous content is higher. Ideally, if coming from the journal it does come from the EiC rather than a generic 'The Editors', but I get that they wouldn't want these in their own CV, as they are acting as the voice of the journal rather than their own publication record. Finding a way to show that these article types are from the journal rather than a specific person seems far more preferable.

NateWr commented 3 years ago

It's probably worth understanding "Errata" better. It seems like maybe there are two types of changes here? One would be a change to a VOR, which should be handled by our versioning system. The other is something like a notice of a change, where editors are publishing information about a change that has been made, separate from the item that was changed?

Am I right in thinking that this is what is meant by "Errata"?

twakeford commented 3 years ago

Yeah, a new version is generally only for small updates that aren't changing the factual nature of the publication. Essentially something that is too small to change the interpretation or use of the publication.

An Errata is issued when something more factual is corrected. This could be metadata that affects the citation (e.g. an author was missed off the original publication or the title needs to change) or content related (e.g. data in a table is incorrect, a quote is misattributed). The errata should describe what the correction is, why it is needed and who is making it, but it would always be a separate notice so that the important differences are very clear.

nils-stefan-weiher commented 1 year ago

Bump!

We really also need this for OMP.

asmecher commented 1 year ago

@nils-stefan-weiher, can you clarify? This issue describes a bunch of different additions to author capabilities -- zero authors, anonymous authors, and organizational authors. Are any of these of particular interest?

AhemNason commented 1 year ago

A reminder: https://github.com/pkp/pkp-lib/issues/5955#issuecomment-876621515

This is standard in publication metadata use downstream by libraries, services, repositories, indexing services... exceedingly typical.

The interest in is accurate metadata instead of munged forms to pass a form requirement. Please. Oh, pretty please, let this occur before I die.

asmecher commented 1 year ago

@AhemNason, what's your timeline? :upside_down_face:

ajnyga commented 1 year ago

This would fit our work on Contributor Roles in CRAFT-OA very well and will be part of our suggestions.

I added this to the CRAFT-OA project and after we get a review on our suggestions from PKP next month will remove it if needed.

Also just an initial thought to the problem with Google Scholar: we could just remove GS tags for submissions that do not have an author. I can see no harm in not having things like "News" indexed there, on the contrary, we should not even be trying to push that kind of content to GS.

withanage commented 2 months ago

@ajnyga what would you suggest for current OJS users e.g. 3.3 to write down an institute , so that it gets correctly migrated into OJS 3.6

Name of the institute fully in author given name, as it is per default mandatory ?