pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 442 forks source link

ORCID Plugin enhancement #2818

Closed withanage closed 5 years ago

withanage commented 6 years ago

Current Status

README.md

Previous plugins

Issue Status/ Explanation   area res.
user-id, author-id Manual insertion of ID  (no-validation)
Button on registration form : Autofill firstname, lastname after validation
Input button on profile: validate and automatic-id adding


orcIdProfile PKP
Login ORCID Login for registered-ojs users pkp-auth ULS
Synchronize Synchronize articles into ORCID HD
Notify send e-mail/notification to auhtors HD  
authenticate authentication for an existing user
Display orcid id under the author on the monograph's view page

pkp-orcid OMP HIRMEOS

References

ajnyga commented 6 years ago

Looking good!

IMHO the visible and editable form fields for the ORCID ID should be removed altogether and only allow properly authenticated ID's like ORCID guidelines suggest. This means that the only way of inserting ORCID ID's to OJS would be via authentication using the ORCID API.

If a user account has an authenticated ID, that could be used when creating the initial author profile. With the secondary authors the collection should happen with emails with links leading to a handler that whould show the suggested article and means to connect the authors ID to the article metadata.

In any case, I think it would be important to discuss any plan with ORCID before doing too much work.

ajnyga commented 6 years ago

Another nice feature which I already mentioned to @bozana would be to enable the collection of ORCID ID's for old article. This would be a plugin (or a plugin feature) which would send an email to all (or selected scope of) authors (meaning the author in the article metadata). The email would again contain a link leading to a handler which would show the submission in question and would enable the author to add an authenticated ORCID id to the article metadata. This could potentially boost the amount of ORCID id's in article metadata a lot.

defstat commented 6 years ago

Hi all. Regarding the HIRMEOS project and the ORCID implementation, EKT has implemented the following for OMP:

ajnyga commented 6 years ago

Hi @defstat! I actually suggested the exact same functionality that you have in that third bullet point to a representative of ORCID when he visited Helsinkin to meet the Finnish ORCID consortium.

According to the ORCID representative even that is regarded as a wrong way adding ORCIDs and insisted that the only way should be by proper authentication. So I stronly suggest that before writing any larger contributions here you contact ORCID and talk with them about the integration.

defstat commented 6 years ago

Hi @ajnyga! The "validation" process suppose to be just a helper for the submission editor, as OJS and OMP allows the submitter to add ORCID to another author.

ajnyga commented 6 years ago

Yes, that is exactly what I was thinking as well. But the thing is that those editable ORCID form fields should not be there at all, if we follow the instuctions how to implement ORCIDs in publishing systems.

ajnyga commented 6 years ago

A nice video of handling the co-authors problem here: https://www.ariessys.com/views-and-press/resources/video-library/orcid-co-author-verification/

defstat commented 6 years ago

@ajnyga my thought exactly about the ORCID field. But as long as its there....

ajnyga commented 6 years ago

Yes, I do agree that if there is an open field, your addition will make it more secure. But what if the ORCID id the editor is testing is for the wrong person with the same name? I think that the whole purpose of ORCID is to provide 100% accurate author data that systems like OJS can use with confidence. Only allowing user authenticated id's is really the only way to achieve this.

asmecher commented 6 years ago

@ajnyga, we could allow manually-entered ORCIDs and validated ORCIDs both, if we flag in the DB (e.g. the user_settings table) whether the ORCID was authenticated. I believe the CrossRef service, for example, accepts such a flag.

ajnyga commented 6 years ago

They seem have this approach also here: https://www.ariessys.com/views-and-press/resources/video-library/orcid-co-author-verification/. I think that JATS XML also has that flag. But how does that show for example in OAI-PMH or DOAJ exports? I mean whether an ORCID id is validated or not? So potentially wrong ORCID id's could end up in a lot of places which in return will weaken the idea of the whole system. Also, I am not sure what exactly you would do with an unvalidated ORCID id?

I think that this is such a central question that discussing with ORCID would be important. I am not 100% sure I have understood the guidelines correctly.

Adding a link to this as well: ORCID iD Throughput in Publishing Workflows, https://www.ncbi.nlm.nih.gov/books/NBK350150/. quote: "When an incorrect iDs becomes part of sources that are considered authoritative, the entire community puts in doubt the validity of the iDs association to an individual, greatly negating the benefit of having these unique iDs."

edit: regarding CrossRef having the authenticated flag in their xml format, they do also state for example here that the ORCID id's should always be authenticated and not entered to a type-in field: https://www.crossref.org/blog/auto-update-has-arrived-orcid-records-move-to-the-next-level/

asmecher commented 6 years ago

When we're ready to formulate a proposal or have a few specific questions I'm happy to find a contact at ORCID to run it past.

Generally I think we have a slight mismatch with ORCID: they're very focused on new publications and active authors, whereas we want a general ID to identify people with, active or not, potentially including old/external data. ORCID doesn't want to support IDs for dead people, for example, so that remains our problem. But I do think we'll have people converting journals between systems where the old system did have good ORCIDs, for example, and in that case I think it's fair to support unvalidated ORCIDs. Other systems support using an "authenticated" flag for this, so that seems like a good balance.

asmecher commented 6 years ago

@ajnyga, my opinion is that we shouldn't restrict editors from working with ORCIDs in cases where they can't have the author authenticate. We generally trust editors with the tools to make comprehensive metadata, and with that comes the risk that they'll add bad quality stuff, but it's up to them. I think OJS should continue to link to ORCID accounts when it has that data available, regardless of whether that was authenticated or not, for example. But that's just my take -- I also wish there was an "ORCID for dead people" but that's a whole other can of worms :)

ajnyga commented 6 years ago

It probably does not matter what system we have, not all of the authors are going to have an unique and authenticated id unless one is given from above (like social security number or to some extent ISNI) and maybe not even then.

I really do not see the point of having unvalidated unique id's if you can not trust them. The whole point of having ORCIDs is to enable us to work with data automatically. If you can not be sure that the id is the correct one, then what use does it have? I would not compare ORCID to other metadata. If the editor misspells the authors last name it will not matter as long as the ORCID id is correct and validated. But if the editor makes a mistake with the ORCID id, it does not matter what other author metadata she has given.

I would encourage you to read the article I linked, especially the chapter "Collecting ORCID iDs from authors". One of the authors is the executive director of ORCID so it's straight from the horse's mouth, suoraan hevosen suusta :-D

asmecher commented 6 years ago

All good points, @ajnyga. I'm content if we proceed as you propose. We may find that imports and conversions are a problem, but perhaps we should consider them the special case that they are rather than trying to make the whole system accommodate them.

ajnyga commented 6 years ago

The problem with this "ORCID purism" is of course that most of the time the editors would get the id's right, but the guidelines are probably like they are because of that 1%. But yeah, maybe my message quota is full for today...

withanage commented 6 years ago

i also agree with the point of having only authenticated orcid ids The importance of the support of export formats is crucial and we have to clearly define it. Come with a suggestion soon for jats, bits etc.

ajnyga commented 6 years ago

Additional thing I started to think about: Could we have a plugin setting (on/off) for enforcing the use of ORCID id's in the workflow at least for the submitting author? I do not have an actual user case, but I think that many journals would like to have the ORCID id as a required metadata.

asmecher commented 6 years ago

@ajnyga, regarding making ORCIDs required, I think this would make sense as a new row in our existing metadata grid (that was extended recently to include a "required" setting): https://github.com/pkp/pkp-lib/issues/2684

No need to make this part of the plugin, I think, since the ORCID fields exist in the core code already.

ajnyga commented 6 years ago

sounds like a good idea!

nils-stefan-weiher commented 6 years ago

Hi everyone, in a sprint during the OJS-de Workshop last week we had a discussion and planning for further ORCID plugin development. Leonhard Maylein (@lmaylein) and me discussed with contributions from Daniela Wolf (@dastewo) and @bozana, about specific changes to the current https://github.com/pkp/orcidProfile Plugin.

Discussion and results was documented in a Google Doc, but only in german. I try to translate and summarize the conclusions and the open questions. We also would like input concerning the open questions.

Aims for plugin enhancements

Discussion results

Open questions

Thanks for reading if you have come this far. In Heidelberg we want to go on and test the ORCID API with the sandbox and update the orcidProfile plugin. What do you think about the steps we outlined above? Do you have any further ideas or contributions to the open questions?

ajnyga commented 6 years ago

Hi,

Great stuff and so happy to see things move forward with this!

A couple of notes that came to mind:

nils-stefan-weiher commented 6 years ago

@ajnyga, you wrote:

In my opinion the whole point of ORCID is to not try to "umgehen" the problem described with OJS, but to make the author validate each article she has authored case by case basis. This should not be automatic in any way.

That would make the development of the plugin easier and maybe is a good way. If this only means that the contributor has to click a link and verify the information.

The part where a ORCID Member API is required would definetly be optional, the plugin has to distinguish, which API key has been entered in the plugin settings and only trigger the profile updates with a member API key.

ajnyga commented 6 years ago

Hi @isgrim agreed: we have to make the validation process as smooth as possible for the author. But hopefully others will comment as well.

Again, great to see you working on this!

nils-stefan-weiher commented 6 years ago

Another point which was discussed and we may also need in the future, are links to authorities data for name authorities (for Germany the GND, VIAF and LOC for other countries) for authors.

This could be bundled in an "authorities" Plugin for contributor meta data, which stores the authority links and maybe additional provides search of authority data. But for this I want to first do some research on existing plugins and issues with discussion about authority data for contributors. And maybe open a new issue. I will post a link here, when that happens.

Maybe it is possible that ORCIDs which have not been validated by the ORCID server can be entered in this way as a link to a name authority with this plugin. But while writing and reading it again, I feel that this may not be the right way.

With the link to name authorities we could tackle the de-duplication of contributor's data and correctness from a different angle. In Heidelberg there maybe many contributors who don’t have ORCID or there maybe a retro digitalisation of an old journal where the authors are no longer alive and so cant have ORCID.

mtub commented 6 years ago

I agree with @ajnyga :

I am still pushing an "ultra purist" view on saving ORCIDs (XD). We should not save manually entered ORCIDs at all, because they can not be trusted and in many cases (DOAJ exports I think) you can not flag ORCIDs as "unvalidated". Also, how would we show unvalidated/validated ORCIDs on the article pages? Of course if flagging ORCIDs means that the unvalidated ones do not show up or are not used anywhere, then why not. But even then, why save them then anyway?

any new version of the ORCID plugin should only feature authenticated ORCID iD.

withanage commented 6 years ago

We have two issues to clarify before we go ahead with the implementation.

Administration-Scenario for ORCID CleintIds and the ClientSecrets Journal manager may see the ORCID CleintId and the ClientSecret in a general OJS set up,. But in a large multi-journal installation, you may need to use the same ClientId and ClientSecret for a set of journals and hide it to the journal manager, because ClientIDs are limited for an organizazation. Our idea is : In Plug-in Installation we define, whether admin or JournalManager can configure the Plug-in for the particular journal. Would it makes sense ?

Connecting authors and users to ORCID Profiles Current status is Authors and users can authorize access to ORCID profiles and for each new submission author has to authenticate. We are thinking it may be useful for an authenticated user to disseminate his ORCID access token to all the publications he has submitted or contributed into the journals in the same installation.
The Plugin could do this by matching the email-addresses in the database. But this could also lead to false matches, if JournalManagers update or change the author email addresses. If we could save in the author_settins, if the author was generated by an authenticated user during submission process, we could be able to copy the access tokens accordingly. Would that make sense or lead to security vulnerabilities ?

asmecher commented 6 years ago

On the first issue:

Journal manager may see the ORCID CleintId and the ClientSecret in a general OJS set up,. But in a large multi-journal installation, ...

In cases like this, here's an approach we've used before:

The system administrator can optionally define settings in the config.inc.php file. When these are present, they take precedence over the Journal Manager's UI-based settings, and those controls are replaced with a note that the admin has taken care of it.

On the second issue:

We are thinking it may be useful for an authenticated user to disseminate his ORCID access token to all the publications he has submitted or contributed into the journals in the same installation.

If you mean batch-claiming old submissions, I don't think email-based claiming is going to be a good establishment of trust. This is one of the reasons I wasn't a strong advocate of forcing the authentication of all ORCiDs -- we'll have a lot of editors who want ORCiD coverage for their journals but won't be able to depend on their authors to jump through hoops. If we have first-class (authenticated) and second-class (unauthenticated) ORCiDs, then journals will be able to use ORCiDs on both the "ideal" and "practical" levels, with good metadata to distinguish between them. I still think that's the best way forward but could reasonably be alone in that :)

However, as ORCiDs are to be our main link between author and user records, I think emails are the best of a bad list of potential disambiguators. I can't think of a good alternative.

ajnyga commented 6 years ago

I think the core of the problem here is that ORCID is designed for claiming authorship - not so much to work as an identifier inside a single program.

If we are to automatically attach ORCID's to records with matching emails, then we should be sure of two things:

I still think that the only solid way of claiming article authorship in OJS is to follow the ORCID instructions on how to do it.

If we would have "second-class ORCIDs" in OJS, what would be the exact use cases? As I said above, if they are not included in metadata exports or in the frontend, then I do not see a problem.

But ORCID does not know the concept of a "second-class ORCID", so any editor trying to find out what it means is not going to find much information. So if we use them in OJS, we need to make sure that the editors know the difference.

mtub commented 6 years ago

@asmecher , I don't think this

we'll have a lot of editors who want ORCiD coverage for their journals but won't be able to depend on their authors to jump through hoops

is a valid approach. ORCID is all about claiming your work and having an unique, universal identifier. If authors already have ORCID iDs, it shouldn't be much of a problem to have them authenticate their ORCID iD once per publication (or: once per installation/journal, but by an action of the author, not based on email addresses).

There are lots of services that let authors enter ORCID iDs without authenticating them (e.g. Zenodo), and maybe authors will ask for something like that when entering metadata on co-authors, but I'd still prefer a stricter approach where no-one but the author makes the connection to ORCID.

I think there sometimes is a little misconception of what ORCID can be used for and for what it's not that suited. I'd emphasize (in that order)

mtub commented 6 years ago

Also, "ORCID coverage for a journal" just doesn't sound right. It's authors that are covered by ORCID, and publication venues, journals, repositories etc. should support this by making claiming etc. as easy as possible. But it's not information that editors can or should maintain.

ajnyga commented 6 years ago

I fully agree with everything @mtub wrote above. Especially with "it's not information that editors can or should maintain".

I think that PKP has a big responsibility to get this right because there are so many journals using OJS.

asmecher commented 6 years ago

I think the core of the problem here is that ORCID is designed for claiming authorship - not so much to work as an identifier inside a single program.

This kind of nails it, I think. We have a problem that I had been hoping ORCID would solve for us, but I think I'm going to have to give that up as fantasy...

I've filed this at https://github.com/pkp/pkp-lib/issues/2986 -- having it filed elsewhere will help me stop jamming considerations onto ORCiDs that it's not designed to handle :)

ajnyga commented 6 years ago

https://forum.pkp.sfu.ca/t/orcid-profile-plugin-setting-is-not-working-sitewide-in-ojs3-1/35410

withanage commented 6 years ago

Nils has already implemented some of the specifications discussed before. We have some questions and need on some advice how to proceed on ?

Notifications

ORCID synchronization

Next steps


Shall we already send a pull request with the current status or is it better to wait?

Here is an example of an updated profile with an OJS 3 submission

https://sandbox.orcid.org/0000-0002-0955-2683

ajnyga commented 6 years ago

Great job! These are purely my views:

  1. I like the idea of having the checkbox there, but how does the editor do the request for ORCIDs later? I mean if the author chooses the deselect those (maybe does not know what it means), then how does the editor ask for the ORCIDs afterwards? Maybe you could just have a "Ask for co-author ORCIDs" button somewhere available just for the editor. The editor could then initiate the process for example when the submission has been received. Just thinking out loud here.

  2. I think that DOIs are stored without the domain name at the moment. Would it make sense to do the same thing with ORCIDs as well and then add the domain to the beginning when showing the ORCID. Because now we are not accepting any written ORCIDs anymore, right?

  3. Should the sync happen after the article is published?

Again, great job, thank you!

nils-stefan-weiher commented 6 years ago

Hi @ajnyga , you wrote

I mean if the author chooses the deselect those (maybe does not know what it means), then how does the editor ask for the ORCIDs afterwards? Maybe you could just have a "Ask for co-author ORCIDs" button somewhere available just for the editor. The editor could then initiate the process for example when the submission has been received. Just thinking out loud here.

The form from the screenshot is the form (AuthorForm class in the code) the editor is presented when editing the Submission Metadata, the checkbox there is exactly for this purpose. Any time the checkbox is set while editing Author data an E-Mail would be generated with a new link to ORCID initiating the authorisation process.

And as an answer to your point 3:

Should the sync happen after the article is published?

The sync only happens after the article is published, there is an option of adding works to an ORCID profile as an on-going work, but for now only published Articles will be posted to the ORCID profiles of contributors, if they authorised it by clicking the link before or after the publication.

ajnyga commented 6 years ago

Thanks @isgrim !

The form from the screenshot is the form (AuthorForm class in the code) the editor is presented when editing the Submission Metadata, the checkbox there is exactly for this purpose. Any time the checkbox is set while editing Author data an E-Mail would be generated with a new link to ORCID initiating the authorisation process.

So the author does not see the checkbox, when she is submitting the article? The same form is used there.

The sync only happens after the article is published, there is an option of adding works to an ORCID profile as an on-going work, but for now only published Articles will be posted to the ORCID profiles of contributors, if they authorised it by clicking the link before or after the publication.

Ok!

nils-stefan-weiher commented 6 years ago

@ajnyga

So the author does not see the checkbox, when she is submitting the article? The same form is used there.

At the moment: The author sees the same checkbox, but it is not working for a newly created Author, because the plugin currently only sends the e-mail if the author is already created. This is due to a limitation, how the Form#execute hook is placed in the AuthorForm class. I would have to change the Authorform#execute method and submit a separate pull request for pkp-lib, but I was not yet ready to go that far without confirmation from PKP.

ajnyga commented 6 years ago

Ok, I am maybe thinking the editor workload of doing the request separately for each author vs. having a single button which sends the request for all authors without an orcid. There are of course many cases where there are no secondary authors, but also those extreme cases where there something like 10 authors in single article.

asmecher commented 6 years ago

^ Looking very good so far, @isgrim and @withanage. @ajnyga's feedback on this suits me just fine.

nils-stefan-weiher commented 6 years ago

@ajnyga you wrote earlier:

I think that DOIs are stored without the domain name at the moment. Would it make sense to do the same thing with ORCIDs as well and then add the domain to the beginning when showing the ORCID. Because now we are not accepting any written ORCIDs anymore, right?

I would also prefer to store only the ID, the host can be determined by the API Url setting in the Plugin, but that would differ from the current validation in the Author and User Forms. I don't know if it is good practice to change the semantics of a core OJS Field by the plugin. So maybe this has to be changed in pkp-lib? @asmecher, what do you think?

EDIT: In hindsight the host probably has to be also stored, because if an ORCID from the Sandbox server was stored and later the Plugin is switched to using the Production API the link to the ORCID Profile would not be valid on the Production server.

asmecher commented 6 years ago

DOIs are a little different because there might be other service providers, and I don't think ORCID has that in mind. Just for the sake of keeping it simple, I'd favour storing the host as well; if a use case comes up later, we could split it apart in the upgrade process without too much work.

ajnyga commented 6 years ago

Are you sure that different service providers for DOIs use different domain names for the resolver links? I mean, I think they all use https://www.doi.org/ (used to be dx.doi.org)

lmaylein commented 6 years ago

The official resolver url is https://doi.org (deprecated: http://dx.doi.org)

lmaylein commented 6 years ago

To display DOIs without the resolver urls is also outdated: https://www.crossref.org/display-guidelines/

ajnyga commented 6 years ago

Sure the display guidelines require it, but OJS stores just the prefix/suffix and add the domain name when it is displayed. I have not strong opinion whether we should store ORCID with the domain name, just figured that it would make sense to use the same approach as with the DOIs.

asmecher commented 6 years ago

@ajnyga and @lmaylein, I stand corrected! I don't hear a strong preference for approach and don't have one myself, so I'd recommend continuing with the current storage habits until there's a need to change.

nils-stefan-weiher commented 6 years ago

We added another functionality:

Automatically send emails during the process of publishing an issue to the article authors who don't have a valid ORCID access token stored. If the author then authorizes the ORCID profile access (by clicking the link in the email), the article will be added to the ORCID profile. This is the same email which will be send by ticking the chechbox on the author metadata form. If the author denies access this will be recorded with a timestamp and can be displayed or considered for future requests.

Screenshot of the plugin settings form: ORCID plugin settings form

The other setting that is displayed there is the requested ORCID profile access scope (for OAuth), the interface for this is not yet final. We need to decide if this would be a selection or checkboxes for the different scopes. The article metadata updates only work with the "/activities/update" Scope, so by disabling this scope the update functionality of the plugin disabled.

@ajnyga Do you have any input on this feature and the scope setting?

ajnyga commented 6 years ago

Hi!

Great to see you working on this, thank you!

About sending the emails when the issue is published. Would this lead to a situation where the journals need to send metadata to places like Crossref and DOAJ twice to get the stored ORCIDs included in the metadata? I mean when you publish the article and register the DOI you probably won't have the ORCID data stored yet and the automated registration scripts usually run within one day after publishing. It will probably be available (if ever) within a couple of days when the authors click the links they got. So basically in these cases the editor should register the DOI again manually to send the data forward to Crossref or manually re-register data to DOAJ?

But I do like the idea of automated message in some point of the workflow. Would this be possible for example when the article moves to production or when "schedule for publication" is selected? This way the ORCID data would probably be ready at the time of publishing when the metadata is sent forward.

Another question about the link. When you say clicking it will add the article data to the ORCID profile you are probably talking about using the member API to push the article data, right? But at the same time you are also storing the author ORCID to the OJS database, right? So just to make sure, if the journal does not have the member API, the emails are still getting sent and the ORCIDs added to the OJS database - only the data being pushed to the ORCID database will be missing, right?

Selecting the scope is probably the same thing as selecting between the free API and the member API, right? If so, then it is a great idea to have a selection for that. I do not have a ready opinion what the selection should be like. I guess the simplest solution would be just having two options there to choose between the limited and full features.