Open danielbreves opened 3 weeks ago
Thanks for this report!
There are currently two main DOI configurations in InvenioRDM
DATACITE_ENABLED = True - All records have a DOI DATACITE_ENABLED = False - No records have a DOI
So then in the migration script https://github.com/inveniosoftware/invenio-app-rdm/blob/810befded43dc5ea49198bf992d6e2393c75ff1e/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py#L85 the parent DOI block will skip registering DOIs when DATACITE is not enabled.
In v12 we also added an additional "is_enabled" setting, which provides some ability to determine whether DOIs are minted based on set criteria. This is also checked in the migration script.
It sounds like you have records that don't have DOIs but you have DATACITE enabled. Do you really want parent DOIs minted for these records? That seems like a unusual choice.
I suspect you'll either want to disable parent DOIs with RDM_PARENT_PERSISTENT_IDENTIFIERS={}, or develop a is_enabled setting to exclude those records without DOIs. But please respond back with your use case if neither of those options sound appropriate.
I think the underlying issue here is that at time of calling register_or_update
(https://github.com/inveniosoftware/invenio-app-rdm/blob/master/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py#L104) the records with "pids": {"doi": ...}
have not been indexed yet (migrate script only commits them to DB not document engine). The manual call to update the records in the document engine (pipenv run invenio rdm rebuild-all-indices
) is only called after the script has been run. But https://github.com/inveniosoftware/invenio-rdm-records/blob/master/invenio_rdm_records/resources/serializers/datacite/schema.py#L415 relies on that indexing to have occurred already to get the versions that are serialized as related identifiers to Datacite.
The migrate script should be split in 2 and interleaved with an enjoinment to the reader to run pipenv run invenio rdm rebuild-all-indices
before running the 2nd script that would run the register_or_update
code. @danielbreves can use that approach right away too.
@fenekku I don't think that's the issue. The child records should already have DOIs before the migration script runs. The migration script is only adding the parent DOI, not the child DOI.
Hmm :thinking: ... Here is the situation, I had a similar issue with the migration script and minting the parent and splitting the script in 2 passes is what solved it for me, because I think that despite the "child" records already having "pids: {"doi": ...}
, they are not indexed in the new index at this point. They were only in the old one and the new one was consulted (because v12 is running)...
Nevertheless, it may not be quite what the original problem was. @danielbreves How did you create a record with a DOI and a record without a DOI in the same instance? Did you change the datacite settings @tmorrell mentioned above (in-between or otherwise)? Understanding that would help us identify the underlying issue.
@tmorrell @fenekku thanks for the fast reply! Our InvenioRDM instance started with DATACITE_ENABLED = False
and had some records created without DOIs, then we enabled DATACITE and created new versions of records with DOIs.
It sounds like you have records that don't have DOIs but you have DATACITE enabled. Do you really want parent DOIs minted for these records? That seems like a unusual choice.
We would like parent DOIs minted, since all future records should have DOIs.
As a side note, I had to add the line below to the migration script to make it print out the stacktrace, since KeyError('doi')
wasn't very helpful.
import traceback
traceback_str = ''.join(traceback.format_tb(e.__traceback__))
secho(traceback_str, fg="red")
Our InvenioRDM instance started with DATACITE_ENABLED = False and had some records created without DOIs, then we enabled DATACITE and created new versions of records with DOIs.
Ah got it. Then yes, I think it'd be reasonable if you could make a PR with that change from ["doi"]
to .get("doi")
with a comment explaining how DOI minting may have been enabled after the fact. That way, when no "doi", this will skip serializing that entry which is fine.
Thanks!
Package version (if known): 10.8.6
Describe the bug
I got the following exception when running https://github.com/inveniosoftware/invenio-app-rdm/blob/master/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py:
This happened because a child record didn't have a DOI. Could that line be
version["pids"].get("doi")
?Steps to Reproduce
Expected behavior
The migration should succeed since DOIs can be optional