inveniosoftware / invenio-app-rdm

Turn-key research data management platform.
https://inveniordm.docs.cern.ch
MIT License
100 stars 146 forks source link

files: don't lock file edition for external DOIs #2259

Closed kpsherva closed 1 year ago

kpsherva commented 1 year ago

Expected result:

Currently editing files is available only via new version. Feature should be available only when explicitly set in the config.

fenekku commented 1 year ago

Mmmh just saw this and commenting based on our limited experience:

I guess I could see:

anyway my 2 cents

slint commented 1 year ago
  • the FAIR principles encourage a different version for different files - my vote is for InvenioRDM to be bullish on these principles and push for them whenever we can.

Fully agree on this, but unfortunately this can only be enforced for records whose DOIs we actually manage. I know that for users that's a technical detail that they might not fully understand.

  • relatively sure our users will be confused, because they don't keep track in their heads of what DOI prefix we manage versus DOI prefixes we don't. So they will think something is off with records we manage ("why can't I change the files in this one and not the other without making a new version?"). Maybe a clear help text would be enough - but first point still holds.

Also agree here, the way this works in Zenodo nowadays is undocumented actually and most users are not aware of it (i.e. we get tickets on our support line for changing files of external DOI records, and kindly reply that they can do so themselves :) ). But at the same time, I can't recall any confused user asking why they can change things in some records and not others.

IMO, having clear visual indicators and explanations for the "forbidden" actions is sufficient in these cases (e.g. "files are locked because they are attached to a registered DOI managed by the repository"). The policy that a repository adopts is also about "educating" users on DOIs and what it means for them to be permanent, persistent, citable, reproducible, etc.

  • Allowing new version, but then managing record from then on - minting a new DOI. A user facing warning that new version will be managed would be nice.
  • Keeping with the same functionality, but providing a help text explaining why new version is needed even for external DOI

That is something that was discussed a lot in InvenioRDM telecons and requirements gathering, i.e. "versioning with mixed DOIs". The issue I see here, is that users start seeing this as a way to take ownership of externally managed objects into their instance... I can come up with a few use cases, and what I would expect to be the right :tm: way to go:


Story A

  1. I publish my dataset on the generalist data repository ABC
  2. I later on find out that there is domain-specific repository DEF, much more suitable for my dataset
  3. I re-upload my dataset in DEF, using the external DOI from ABC.
  4. After a month, I have updates to my dataset, so I publish a new version in ABC.

This is somewhat fine... What would make it great is if the user also updates the original dataset in ABC with a isPreviousVersionOf/IsContinuedBy/IsObsoletedBy DataCite related identifier pointing to the DEF records. That might not be possible though, it really depends on the capabilities and ingestion workflows of the ABC data repository.


Story B

  1. Journal PQR, published my article.
  2. I publish a copy of it in my institutional repository using the journal article DOI.
  3. I found some mistakes in the tables/figures. I'll fix them and publish them on my institutional repo as a new version of the existing record.

This is bad now, because the journal doesn't know about these changes, which are traditionally published as errata in follow-up issues and are clearly displayed on the journal article's landing page. It's very likely that the journal also doesn't have a way to point to the "corrected" version at the institutional repository.


Both of the above cases suffer from the possibility that there might not be human and/or machine-readable metadata describing the relationships between these objects, either on the hosting repositories/journal landing pages or the metadata registered for the DOIs.

fenekku commented 1 year ago

tldr; Yes for a nicer explainer especially if it's the way InvenioRDM will do it.

What I don't quite understand is that it seems like InvenioRDM is trying to overcome a shortcoming of the journal at the cost of clear versioning and I don't know if it does overcome it in the process. In both scenarios the problem seems to be that

1) "it really depends on the capabilities and ingestion workflows of the ABC data repository." 2) "the journal also doesn't have a way to point to the "corrected" version at the institutional repository." Or most probably: the journal doesn't have a way to point to the record on the institutional repository at all. (if the journal could link to one version on InvenioRDM we would be fine since from then on, we provide the relationships links)

In other words for both: the journal can't link to this other entry.

Just at OpenRepositories I was trying to explain DOI management to a librarian from a different institution using 2 systems with the same DOI prefix. "InvenioRDM can't prevent you from 'managing' the same DOI from 2 different services. It doesn't make them communicate with each other." I was telling her. As far as I can tell, Datacite's REST API doesn't even do conflict resolution (one system can override the DOI entry without the other knowing it before it updates it again).

Fully agree on this, but unfortunately this can only be enforced for records whose DOIs we actually manage.

Can you expand on that one? In InvenioRDM, users can't deposit multiple uploads with the same external (non managed) DOI. Would that qualify as enforcement? And the functionality now of only allowing new files in new version is enforcement too, right? Agreed, we can't enforce it in another system though.

(I am quite jetlagged so if I am missing something obvious or just being weird that's why :smile:)

slint commented 1 year ago

What I don't quite understand is that it seems like InvenioRDM is trying to overcome a shortcoming of the journal at the cost of clear versioning and I don't know if it does overcome it in the process

I wouldn't say we're actively trying to overcome these shortcomings at all. We might be allowing workarounds though via what we implement, but I would never e.g. recommend to a user to go with the actions of "Story B" (it also very likely violates Journal PQR's T&C).

  1. [...] Or most probably: the journal doesn't have a way to point to the record on the institutional repository at all. (if the journal could link to one version on InvenioRDM, we would be fine since, from then on, we provide the relationships links) [...] In other words for both: the journal can't link to this other entry.

IMO, the journal shouldn't do this and probably also never will, since:

  1. there's no incentive on their side to "lose traffic" and delegate publishing to institutional/generalist repositories
    • The institutional repository on the other hand is the one that has the incentive to keep a "copy" of the journal article (using the external DOI), if that is something that also complies with the journal's policies.
  2. it might be actually a "good" thing since the journal already has proper peer-review post-publishing workflows in place to deal with these situations.

But now we're going a bit in the direction of a rant/opinion-piece on journals so I'll stop :)

Just at OpenRepositories I was trying to explain DOI management to a librarian from a different institution using 2 systems with the same DOI prefix. "InvenioRDM can't prevent you from 'managing' the same DOI from 2 different services. It doesn't make them communicate with each other." I was telling her. As far as I can tell, Datacite's REST API doesn't even do conflict resolution (one system can override the DOI entry without the other knowing it before it updates it again).

Correct, an InvenioRDM instance might have multiple prefixes configured (this is not implemented anywhere AFAIK, but was discussed). IMO, a service/website/repository that manages DOI prefixes, should also be the one and only controlling and serving the resolving URLs for DOIs registered under these prefixes. Otherwise, as you said, there's no way to deal with conflicts.

Fully agree on this, but unfortunately this can only be enforced for records whose DOIs we actually manage.

Can you expand on that one? In InvenioRDM, users can't deposit multiple uploads with the same external (non managed) DOI. Would that qualify as enforcement? And the functionality now of only allowing new files in new version is enforcement too, right? Agreed, we can't enforce it in another system though.

Basically, a repository can only be responsible for the accompanying files of a DOI that it manages. By enforce I mean that we're responsible and thus if it's not a feature of RDM nobody can change files.

Of course as part of support operations for a repository an admin should be able to exceptionally handle cases to fix obvious mistakes and modify files (e.g. user accidentally uploaded an empty file, or included Thumbs.db/_MACOSX files) within a grace period. But that is again a "policy" of the repository, and reflects also that the world is messy and we have to be pragmatic sometimes.

(I am quite jetlagged so if I am missing something obvious or just being weird that's why 😄)

Hehe, unfortunately it also doesn't help that proper DOI versioning is not a solved issue and there's no best practice either AFAIK 🤷‍♂️

fenekku commented 1 year ago

Ha, ha, I find myself nodding along but not really changing my mind - probably because "DOI versioning is not a solved issue". To wrap this up here is my condensed understanding:

We are considering the use case of updating files for a non-managed DOI record.

Either

A- we allow files to be changed directly without new version. This is different than for managed-DOI record, where files can only be changed with a new version (in normal flow of things). It's less steps (so that's good), but it's not explicit about the change (no referenceable versions). The record is not linked to from the journal anyway, so from that angle the lack of explicit versioning doesn't matter too much. If the record is linked from other places though, the lack of versioning is a shame. Because it is "exceptional" behavior, more help text / explanations are needed.

B- we only allow files to be changed with a new version. The flip of the above arguments applies.

I favor B over A at 85-15 because it's less exceptional/"edge-casey" and more explicit. I think those are good principles for the user and for the implementation: because same functionality across managed/non-managed, it's easier to implement and explain.


Because being able to edit a record's files irrespective of DOI managed state is 100% legitimate and not an edge case, I see it as a completely separate feature unrelated to the above.

I would see it as a separate button on the edit page ("Correct files" next to "New version") with one of these 2 behaviors:


Thanks for reading through and the replies! If that's representative of the situation, then I've said my piece. I will be able to refer to this discussion to explain the approach to stakeholders on our side.

tmorrell commented 1 year ago

I'm generally in favor of relaxing the file restrictions as described in this issue, but the removal of versioning for externally managed DOI records as implemented in https://github.com/inveniosoftware/invenio-app-rdm/pull/2264 is a big issue for us. There are a lot of use cases where users might want to create versions, and the fact that the content has an external DOI shouldn't block then. But one really clear requirement for versioning is as follows:

At a minimum I think disabling versions and allowing file modifications should be disentangled. Different repositories might want to make different decisions, and bundling them into an external doi flag is going to make it really confusing to explain.

kpsherva commented 1 year ago

Please note that using of this feature will not be enforced across anyone using InvenioRDM, since it is a configurable feature, available on demand for repository managers who need a solution for cases explained by Alex above, therefore I would like to highlight the fact that it will not cause issues with the existing DOI and versioning workflows.

tmorrell commented 1 year ago

Yes, but the default in the PRs is to not lock files and disable versioning. So it's a change to the default behavior of RDM.

I think the easiest solution is to split up the config with:

RDM_LOCK_FILES_FOR_EXTERNAL_DOI = False
RDM_ENABLE_VERSIONS_FOR_EXTERNAL_DOI = False

We can also argue about what the defaults should be, but this would be more flexible and understandable.

kpsherva commented 1 year ago

Yes, but the default in the PRs is to not lock files and disable versioning. So it's a change to the default behavior of RDM.

I think the easiest solution is to split up the config with:

RDM_LOCK_FILES_FOR_EXTERNAL_DOI = False
RDM_ENABLE_VERSIONS_FOR_EXTERNAL_DOI = False

We can also argue about what the defaults should be, but this would be more flexible and understandable.

Not breaking the backwards compatibility was the initial goal. Since the PR didn't go yet fully through the review process it might not be fully implemented, since you've noticed that the compatibility part might be at risk, could you point it out in the review process? I think suggesting it in the review directly will facilitate the communication.