Closed hjsyoo closed 7 years ago
moving to sprint 11
@lsitu - @hweng noted this change would need to be made in damsrepo, and she ran out of time last Sprint to work on it. Could you please take a look in this Sprint when you can? Thanks.
@mcritchlow Sure. I am looking into it now.
@hjsyoo Do you have an example of the converted XML that will be sent when minting the DOI, which includes the elements we need to change?
Per the description, it seems like that corresponding elements should be in the following format:
Language code error (https://library.ucsd.edu/dc/collection/bb4473712z) <titles> <title xml:lang="en-US"> Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) Float Data Archive </title> </titles>
Default setting for Resource Type (https://library.ucsd.edu/dc/collection/bb4473712z) <resourceType resourceTypeGeneral="Dataset"/>
Subjects (https://library.ucsd.edu/dc/collection/bb4473712z) <subjects> ... <subject>Southern Ocean</subject> </subjects>
Subjects (https://library.ucsd.edu/dc/object/bb66239018) <subjects> <subject>Biogeochemistry</subject> <subject>Hydrography</subject> <subject>Southern Ocean carbon</subject> <subject>Southern Ocean</subject> </subjects>
Title for objects What's the format of the data to be sent? It looks like we are sending it in three separated elements for objects: [CollectionTitle], [Creators], and [ObjectTitle].
@lsitu I'm not sure what would help you most. Is it an XML that EZID (I think) exports to DataCite? That might be this, for the SOCCOM example: https://ezid.cdlib.org/manage/display_xml/doi:10.6075/J0ZK5DMX. Although, this one is weird. It lists the Title twice, and it's missing the Geographic Subject, "Southern Ocean", which I must have failed to add to the record. On second thought, this record was mistakenly minted twice, and may not be the best example.
Here's one I just minted (using the button), then edited in the EZID UI: https://ezid.cdlib.org/manage/display_xml/doi:10.6075/J0P26W1T. It looks correct, although I should mention that I changed Title Language to "eng". EZID didn't complain when I submitted the change. I did have to add the Description and most of the subjects (7 of 9).
I'm uncertain how to answer the last question. I would like the Title in the EZID record to be formatted as: [ObjectTitle]. In [CollectionTitle]. Is there additional information you need?
@hjsyoo For the first issue "Language code error", could you try wether changing the Title Language to "en-US" works?
For the last issue, I don't know why you currently got "LastName, FirstName MI; LastName, FirstName MI (Year): ObjectTitle" since we submitted the [ObjectTitle] and [Creators] as independent elements. Could it be just the display issue from DataCite? I can change the title and submitted it to DataCite in the following format for object like https://library.ucsd.edu/dc/object/bb66239018:
Is it what you want?
@mcritchlow I've created branch feature/doi_datacite without lib-camel integration and added a commit to fix the datacite metadata issues. I think we can create a release branch with a deployment ticket and deploy it to staging for testing as we discussed earilier. Could you review commit https://github.com/ucsdlib/damsrepo/commit/4f4799bc8584e91594bcccec2d3b67dde3f45020? Thanks.
@lsitu - looks good to me based on the info you have so far. Let's get this in staging as you noted for @hjsyoo to review 👍
@mcritchlow We need to tag it as release branch for the deployment as we did for damsmanager. What version could we use for this deployment?
@lsitu i think the next version of damsrepo is supposed to be 4.33 (per the most recent JIRA ticket I can find)
@lsitu In response to your first question - yes, changing the Title Language to "en-US" can be done in the EZID UI without generating an error message. Regarding the second question, I'm not sure why Creators is involved, but yes, the title for https://library.ucsd.edu/dc/object/bb66239018 should be as you indicated. The EZID record is currently correct: https://ezid.cdlib.org/id/doi:10.6075/J09021PC.
@hjsyoo It's good that changing the Title Language to "en-US" works since this could be coming from the xml:lang attribute in RDF element. It seem like that the issue with those Creators prefixed to the title shouldn't be related since the Titles and the Creators are submitted as different elements. Let's test it out once we deploy the codes to staging to see whether it works as expected then.
@mcritchlow Yeah, the version should be 4.33 now and I've created the release branch for it. But release/4.32 includes lib-camel integration and I think we can rename it later if that causes any confusions.
Sounds good!
@hjsyoo I think it's ready for you test the Mint DOI function on staging https://librarytest.ucsd.edu/dc now. Thanks @jhriv for the manual deploying to staging.
Hi @lsitu, I did some testing. Here are some actual and potential issues. https://ezid.cdlib.org/id/doi:10.5072/FK21C21G9N (CalCOFI coll):
https://ezid.cdlib.org/id/doi:10.5072/FK2CN75144 (SOAS object):
As a side note, when testing on staging, there were some records (i.e., http://librarytest.ucsd.edu/dc/object/bb6213224w, http://librarytest.ucsd.edu/dc/collection/bb7305352v) which gave me the error message: Record aleady has a DOI assigned. But others didn't give me the error message, and allowed me to mint a second DOI (i.e., http://librarytest.ucsd.edu/dc/collection/bb87730652, http://librarytest.ucsd.edu/dc/object/bb7886108t).
@hjsyoo Thank you very much for testing it out. It looks like you've brought up some new issues. I am not sure whether it's better to continue work on this ticket or just open a new ticket so that we can wrap up this ticket quickly. But either way will be fine with me.
Here are some questions for the issues above: https://ezid.cdlib.org/id/doi:10.5072/FK21C21G9N (CalCOFI coll):
Collections (as for objects) should be assigned the value, "Dataset", for Resource Type General, not "Collection". *Question: Are we going to use "Dataset" for all collections, or RCI collections only?
The Subject, Scripps Insitution of Oceanography, has "[naf]" appended to it in the EZID record. This string isn't in the original collection record, and shouldn't be there. I can't edit it thru the EZID UI, because [naf] doesn't appear when in edit mode, but it appears when in viewing mode. https://ezid.cdlib.org/id/doi:10.5072/FK2CN75144 (SOAS object): *Question: I think "[naf]" is the mads:MADSScheme of the subject "Scripps Institution of Oceanography", which is attached to the subjectScheme attribute of the subject when submitting it to datacite. Do you just want to ignore the subjectScheme attribute for all subjects? If not, I think we may have to correct it in the subject authority record "Scripps Institution of Oceanography" itself. What do you think?
This is a minor issue, but the Formats field seems overly populated. The object has 11 images, 2 ZIPs, and 1 empty component (serves as a header only, in the components list). The EZID record lists the following: image This formats list was slightly different when I minted from production: https://ezid.cdlib.org/id/doi:10.6075/J0P26W1T. *Question: I am not sure what happened on prod since the PDF and those two zip file seem to be missing. Could it be an edited version? The one on staging looks close with several service derivatives files attached and I think we can fix that.
I believe the trailing period in the Title needs to be removed. When a citation is formatted by EZID (and maybe by DataCite downstream), a period seems to be inserted automatically after the Title. I noticed that another record whose title ends in "?" had a trailing period appended to it. So, in "ObjectTitle. In CollectionTitle.", the last period should be removed. *Answer: Yes, the last period can be simply removed.
Description is missing. It may be because only Description [Abstract] is automatically pushed to EZID. (This object only has a Methods.) Is it possible to have a rule where only Abstract is pushed, but if there is no Abstract present, then Methods gets pushed? If this is too complicated, then I can add Methods manually, as these aren't as common as Abstracts.
*Question: We are pushing dams:Note[dams:type='description'] as element
@lsitu, here are some responses. I'll need @arwenhutt's input for some of them.
https://ezid.cdlib.org/id/doi:10.5072/FK21C21G9N (CalCOFI coll):
Collections (as for objects) should be assigned the value, "Dataset", for Resource Type General, not "Collection". Question: Are we going to use "Dataset" for all collections, or RCI collections only? HJ: It's probably safe to use for all collections, as only RDCP mints DOIs at present.
The Subject, Scripps Insitution of Oceanography, has "[naf]" appended to it in the EZID record. This string isn't in the original collection record, and shouldn't be there. I can't edit it thru the EZID UI, because [naf] doesn't appear when in edit mode, but it appears when in viewing mode. https://ezid.cdlib.org/id/doi:10.5072/FK2CN75144 (SOAS object): Question: I think "[naf]" is the mads:MADSScheme of the subject "Scripps Institution of Oceanography", which is attached to the subjectScheme attribute of the subject when submitting it to datacite. Do you just want to ignore the subjectScheme attribute for all subjects? If not, I think we may have to correct it in the subject authority record "Scripps Institution of Oceanography" itself. What do you think? HJ: I don't think the [naf] text is propagating downstream to DataCite in an improper way, so it's probably best to leave it alone. I don't want to create new problems if it isn't causing problems now. I just didn't understand what its purpose is, but with your help, I understand it better now. @arwenhutt, would you agree that it can be left alone?
This is a minor issue, but the Formats field seems overly populated. The object has 11 images, 2 ZIPs, and 1 empty component (serves as a header only, in the components list). The EZID record lists the following: image This formats list was slightly different when I minted from production: https://ezid.cdlib.org/id/doi:10.6075/J0P26W1T. Question: I am not sure what happened on prod since the PDF and those two zip file seem to be missing. Could it be an edited version? The one on staging looks close with several service derivatives files attached and I think we can fix that. HJ: It's possible I edited it, but I don't think I would've touched the Formats field.
I believe the trailing period in the Title needs to be removed. When a citation is formatted by EZID (and maybe by DataCite downstream), a period seems to be inserted automatically after the Title. I noticed that another record whose title ends in "?" had a trailing period appended to it. So, in "ObjectTitle. In CollectionTitle.", the last period should be removed. *Answer: Yes, the last period can be simply removed.
Description is missing. It may be because only Description [Abstract] is automatically pushed to EZID. (This object only has a Methods.) Is it possible to have a rule where only Abstract is pushed, but if there is no Abstract present, then Methods gets pushed? If this is too complicated, then I can add Methods manually, as these aren't as common as Abstracts. Question: We are pushing dams:Note[dams:type='description'] as element to datacite at this time. Do you mean you want to push dams:Note[dams:type='methods'] as an alternative description? HJ: I think the answer is yes, but would like @arwenhutt's confirmation on this one.
dams:Note[dams:type='description']
and dams:Note[dams:type='methods']
to dataciteThanks @arwenhutt and @hjsyoo. @hjsyoo Do you have an example for # 5 that we can use for test?
@mcritchlow I've added a commit to update the stylesheet to address those five new issues that @hjsyoo brought up in #issuecomment-306983031 above. It's ready for review now. See commit https://github.com/ucsdlib/damsrepo/commit/2a7846ebd6a6d3752a204729ffa9ba4550f8cac6 We need @jhriv to deploy it to staging for @hjsyoo to test again once you approve it. Thanks.
@jhriv I've merge it to the release/4.33 branch for damsrepo. Could we deploy damsrepo release/4.33 to staging? Matt is out sick today and we had better have @hjsyoo to test it on staging before the end of the sprint today. Thanks.
@lsitu Do you still need an example for #5? If so, the collection, https://library.ucsd.edu/dc/collection/bb6282674b, has the ezid record, https://ezid.cdlib.org/id/doi:10.6075/J0P26W1T. This collection has a Methods note, but no Abstract.
Thanks @hjsyoo.
@lsitu Please note, though, that I added the Methods manually for this record, which is on prod. I should be able to mint a doi for a record on staging, if you prefer.
@hjsyoo Please wait until @jhriv deploys it to staging. Thanks.
@hjsyoo John just deployed it to staging and it's ready for test now. Thanks.
@lsitu I'm testing the minting now. 6) One thing I've already noticed is that the Title is sent twice to EZID: https://ezid.cdlib.org/id/doi:10.5072/FK2QF8RM4J.
@lsitu Here's another - 7) The formats in the EZID record (https://ezid.cdlib.org/id/doi:10.5072/FK2FX7B075) still don't show a clear, one-to-one correspondence with the actual file formats in the record, http://librarytest.ucsd.edu/dc/object/bb7920789g. I don't have a use case for formats right now, so if you think it's best to put it on a separate ticket, that would be fine with me.
@hjsyoo For the title, I think this is just the metadata issue in the original rdf, which has a dup title: http://librarytest.ucsd.edu/dc/collection/bb5940732k/data. For the format, I think we just sending the formats from all the service files. We won't send any master source files. Does the rule look correct?
@lsitu Interesting about the duped title. @arwenhutt, I noticed the same thing in CCDB on prod. Looks like it should be deduped? Regarding format, @arwenhutt, I'm not seeing a clear match between the file formats on a landing page (e.g., http://librarytest.ucsd.edu/dc/object/bb7920789g) and the formats that get pushed to EZID (https://ezid.cdlib.org/id/doi:10.5072/FK2FX7B075). There are other examples I can share if needed. Is this something I should just ignore, or is it a concern?
@lsitu, Collections are still getting assigned a Resource type value of "Collection". I'd prefer "Dataset" to be the default value. https://ezid.cdlib.org/id/doi:10.5072/FK26D5WC9R
I'm not sure if this issue has been worked on before deploy, but as an FYI, The Description[Methods] didn't make it into the EZID record: https://ezid.cdlib.org/id/doi:10.5072/FK2B56M63S.
@hjsyoo It seems like there is a gap some where. I will ask @jhriv double check the deployment status and let you know if we are ready for more tests. Thanks.
@lsitu Ok, the Subjects and Title Language are getting pushed properly. I have to wrap up testing for the day. Feel free to open a new ticket or roll this one over, whichever works best for you. Thanks for all your help with this.
@hjsyoo I think all five new issues above should be addressed in my commit this morning. However something going wrong with the deployment earlier and @jhriv just redeploy it. Sorry about it. Could you test it again when you get a chance? Thanks.
@lsitu, The EZID push looks great! Just two issues remain. 1) EZID has multiple Description types. When we push a Methods note, can it get mapped to Description [Methods] in EZID? In https://ezid.cdlib.org/id/doi:10.5072/FK2736T405, the first Description is an Abstract (and is correctly mapped in EZID), but the second Description is a Methods note. 2) The formats mapped for objects (e.g., https://ezid.cdlib.org/id/doi:10.5072/FK21J9F939) are still not human readable, or at least they're hard for this human to understand. Again, I think @arwenhutt can best answer the question of whether this issue needs to be addressed.
@hjsyoo Thanks for testing it out. Yes we could do # 1. For # 2, I think we are listing all the formats from the services files at this time. If we want to change the mapping for this, Could we open a new ticket so that it won't delay the deployment for damsrepo?
@lsitu Arwen is out sick, and I don't see this as a rush. Can we open the new ticket later, when she's back? It's fine to close this ticket in the meanwhile - let's not delay deployment.
@hjsyoo It sounds good. I've corrected the Description type for the methods note to "Methods" and we are moving forward with the deployment now. @jhriv Could you deploy damsrepo release/4.33 https://github.com/ucsdlib/damsrepo/commits/release/4.33 to staging again? Thanks.
@hjsyoo John had deployed it to staging and damsrepo release 4.33 with the description type change is ready on staging for review now.
@lsitu It looks great! Should I close the ticket now, or wait until deploy to production?
@hjsyoo Just feel free to close it at your convenience. Thanks.
Thanks for all your help, @lsitu!
Sure. I am glad that we work it out, @hjsyoo!
Descriptive summary
Requesting several adjustments to Mint DOI button functionality under Curator Tools.
Rationale
Our recent need for richer metadata exports to EZID is driven largely by an increasing integration of DataCite records with aggregation services, such as SHARE. Any metadata gaps must currently be edited manually in the EZID UI.
Expected and Actual behavior
Language code error The Mint DOI button successfully creates a DOI record in EZID, but there is a language code error that becomes apparent only when editing the EZID record (see screenshot, which pertains to the DOI pointing to https://library.ucsd.edu/dc/collection/bb4473712z).
Default setting for Resource Type Resource Type General for collections (e.g., https://library.ucsd.edu/dc/collection/bb4473712z) and objects is assigned by default to "Collection" and "Other" respectively, in EZID. Change the default setting of Resource Type General for all RDCP content in EZID to "Dataset", and leave the Resource Type field blank.
Subjects Geographic subjects in both collection (e.g., https://library.ucsd.edu/dc/collection/bb4473712z) and object records do not currently get exported to the EZID record. Include these as Subjects in the export.
Subjects Not all subjects were exported for https://library.ucsd.edu/dc/object/bb66239018. Only the first Topic in the DAMS record was automatically exported to EZID. I had to manually add the remaining two Topics, plus the Geographic subject.
Title for objects When a DOI is minted specifically for an object (as opposed to a collection), can we change the title format to include both of the title-related sentences in the citation? The citation format is currently: LastName, FirstName MI; LastName, FirstName MI (Year): ObjectTitle. In CollectionTitle. UC San Diego Library Digital Collections. The full title should be exported to the EZID record as "[ObjectTitle]. In [CollectionTitle]."
Steps to reproduce the behavior
Related work