ucsdlib / damspas

UC San Diego DAMS Hydra Head
Other
7 stars 5 forks source link

Mint DOI button fix requests #302

Closed hjsyoo closed 7 years ago

hjsyoo commented 7 years ago

Descriptive summary

Requesting several adjustments to Mint DOI button functionality under Curator Tools.

Rationale

Our recent need for richer metadata exports to EZID is driven largely by an increasing integration of DataCite records with aggregation services, such as SHARE. Any metadata gaps must currently be edited manually in the EZID UI.

Expected and Actual behavior

Steps to reproduce the behavior

Related work

mcritchlow commented 7 years ago

moving to sprint 11

mcritchlow commented 7 years ago

@lsitu - @hweng noted this change would need to be made in damsrepo, and she ran out of time last Sprint to work on it. Could you please take a look in this Sprint when you can? Thanks.

lsitu commented 7 years ago

@mcritchlow Sure. I am looking into it now.

lsitu commented 7 years ago

@hjsyoo Do you have an example of the converted XML that will be sent when minting the DOI, which includes the elements we need to change?

Per the description, it seems like that corresponding elements should be in the following format:

hjsyoo commented 7 years ago

@lsitu I'm not sure what would help you most. Is it an XML that EZID (I think) exports to DataCite? That might be this, for the SOCCOM example: https://ezid.cdlib.org/manage/display_xml/doi:10.6075/J0ZK5DMX. Although, this one is weird. It lists the Title twice, and it's missing the Geographic Subject, "Southern Ocean", which I must have failed to add to the record. On second thought, this record was mistakenly minted twice, and may not be the best example.

Here's one I just minted (using the button), then edited in the EZID UI: https://ezid.cdlib.org/manage/display_xml/doi:10.6075/J0P26W1T. It looks correct, although I should mention that I changed Title Language to "eng". EZID didn't complain when I submitted the change. I did have to add the Description and most of the subjects (7 of 9).

I'm uncertain how to answer the last question. I would like the Title in the EZID record to be formatted as: [ObjectTitle]. In [CollectionTitle]. Is there additional information you need?

lsitu commented 7 years ago

@hjsyoo For the first issue "Language code error", could you try wether changing the Title Language to "en-US" works?

For the last issue, I don't know why you currently got "LastName, FirstName MI; LastName, FirstName MI (Year): ObjectTitle" since we submitted the [ObjectTitle] and [Creators] as independent elements. Could it be just the display issue from DataCite? I can change the title and submitted it to DataCite in the following format for object like https://library.ucsd.edu/dc/object/bb66239018:

SOCCOM float data - Snapshot 2017-03-08. In Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) Float Data Archive.

Is it what you want?

lsitu commented 7 years ago

@mcritchlow I've created branch feature/doi_datacite without lib-camel integration and added a commit to fix the datacite metadata issues. I think we can create a release branch with a deployment ticket and deploy it to staging for testing as we discussed earilier. Could you review commit https://github.com/ucsdlib/damsrepo/commit/4f4799bc8584e91594bcccec2d3b67dde3f45020? Thanks.

mcritchlow commented 7 years ago

@lsitu - looks good to me based on the info you have so far. Let's get this in staging as you noted for @hjsyoo to review 👍

lsitu commented 7 years ago

@mcritchlow We need to tag it as release branch for the deployment as we did for damsmanager. What version could we use for this deployment?

mcritchlow commented 7 years ago

@lsitu i think the next version of damsrepo is supposed to be 4.33 (per the most recent JIRA ticket I can find)

hjsyoo commented 7 years ago

@lsitu In response to your first question - yes, changing the Title Language to "en-US" can be done in the EZID UI without generating an error message. Regarding the second question, I'm not sure why Creators is involved, but yes, the title for https://library.ucsd.edu/dc/object/bb66239018 should be as you indicated. The EZID record is currently correct: https://ezid.cdlib.org/id/doi:10.6075/J09021PC.

lsitu commented 7 years ago

@hjsyoo It's good that changing the Title Language to "en-US" works since this could be coming from the xml:lang attribute in RDF element. It seem like that the issue with those Creators prefixed to the title shouldn't be related since the Titles and the Creators are submitted as different elements. Let's test it out once we deploy the codes to staging to see whether it works as expected then.

lsitu commented 7 years ago

@mcritchlow Yeah, the version should be 4.33 now and I've created the release branch for it. But release/4.32 includes lib-camel integration and I think we can rename it later if that causes any confusions.

hjsyoo commented 7 years ago

Sounds good!

lsitu commented 7 years ago

@hjsyoo I think it's ready for you test the Mint DOI function on staging https://librarytest.ucsd.edu/dc now. Thanks @jhriv for the manual deploying to staging.

hjsyoo commented 7 years ago

Hi @lsitu, I did some testing. Here are some actual and potential issues. https://ezid.cdlib.org/id/doi:10.5072/FK21C21G9N (CalCOFI coll):

https://ezid.cdlib.org/id/doi:10.5072/FK2CN75144 (SOAS object):

As a side note, when testing on staging, there were some records (i.e., http://librarytest.ucsd.edu/dc/object/bb6213224w, http://librarytest.ucsd.edu/dc/collection/bb7305352v) which gave me the error message: Record aleady has a DOI assigned. But others didn't give me the error message, and allowed me to mint a second DOI (i.e., http://librarytest.ucsd.edu/dc/collection/bb87730652, http://librarytest.ucsd.edu/dc/object/bb7886108t).

lsitu commented 7 years ago

@hjsyoo Thank you very much for testing it out. It looks like you've brought up some new issues. I am not sure whether it's better to continue work on this ticket or just open a new ticket so that we can wrap up this ticket quickly. But either way will be fine with me.

Here are some questions for the issues above: https://ezid.cdlib.org/id/doi:10.5072/FK21C21G9N (CalCOFI coll):

hjsyoo commented 7 years ago

@lsitu, here are some responses. I'll need @arwenhutt's input for some of them.

https://ezid.cdlib.org/id/doi:10.5072/FK21C21G9N (CalCOFI coll):

  1. Collections (as for objects) should be assigned the value, "Dataset", for Resource Type General, not "Collection". Question: Are we going to use "Dataset" for all collections, or RCI collections only? HJ: It's probably safe to use for all collections, as only RDCP mints DOIs at present.

  2. The Subject, Scripps Insitution of Oceanography, has "[naf]" appended to it in the EZID record. This string isn't in the original collection record, and shouldn't be there. I can't edit it thru the EZID UI, because [naf] doesn't appear when in edit mode, but it appears when in viewing mode. https://ezid.cdlib.org/id/doi:10.5072/FK2CN75144 (SOAS object): Question: I think "[naf]" is the mads:MADSScheme of the subject "Scripps Institution of Oceanography", which is attached to the subjectScheme attribute of the subject when submitting it to datacite. Do you just want to ignore the subjectScheme attribute for all subjects? If not, I think we may have to correct it in the subject authority record "Scripps Institution of Oceanography" itself. What do you think? HJ: I don't think the [naf] text is propagating downstream to DataCite in an improper way, so it's probably best to leave it alone. I don't want to create new problems if it isn't causing problems now. I just didn't understand what its purpose is, but with your help, I understand it better now. @arwenhutt, would you agree that it can be left alone?

  3. This is a minor issue, but the Formats field seems overly populated. The object has 11 images, 2 ZIPs, and 1 empty component (serves as a header only, in the components list). The EZID record lists the following: image This formats list was slightly different when I minted from production: https://ezid.cdlib.org/id/doi:10.6075/J0P26W1T. Question: I am not sure what happened on prod since the PDF and those two zip file seem to be missing. Could it be an edited version? The one on staging looks close with several service derivatives files attached and I think we can fix that. HJ: It's possible I edited it, but I don't think I would've touched the Formats field.

  4. I believe the trailing period in the Title needs to be removed. When a citation is formatted by EZID (and maybe by DataCite downstream), a period seems to be inserted automatically after the Title. I noticed that another record whose title ends in "?" had a trailing period appended to it. So, in "ObjectTitle. In CollectionTitle.", the last period should be removed. *Answer: Yes, the last period can be simply removed.

  5. Description is missing. It may be because only Description [Abstract] is automatically pushed to EZID. (This object only has a Methods.) Is it possible to have a rule where only Abstract is pushed, but if there is no Abstract present, then Methods gets pushed? If this is too complicated, then I can add Methods manually, as these aren't as common as Abstracts. Question: We are pushing dams:Note[dams:type='description'] as element to datacite at this time. Do you mean you want to push dams:Note[dams:type='methods'] as an alternative description? HJ: I think the answer is yes, but would like @arwenhutt's confirmation on this one.

arwenhutt commented 7 years ago
lsitu commented 7 years ago

Thanks @arwenhutt and @hjsyoo. @hjsyoo Do you have an example for # 5 that we can use for test?

lsitu commented 7 years ago

@mcritchlow I've added a commit to update the stylesheet to address those five new issues that @hjsyoo brought up in #issuecomment-306983031 above. It's ready for review now. See commit https://github.com/ucsdlib/damsrepo/commit/2a7846ebd6a6d3752a204729ffa9ba4550f8cac6 We need @jhriv to deploy it to staging for @hjsyoo to test again once you approve it. Thanks.

lsitu commented 7 years ago

@jhriv I've merge it to the release/4.33 branch for damsrepo. Could we deploy damsrepo release/4.33 to staging? Matt is out sick today and we had better have @hjsyoo to test it on staging before the end of the sprint today. Thanks.

hjsyoo commented 7 years ago

@lsitu Do you still need an example for #5? If so, the collection, https://library.ucsd.edu/dc/collection/bb6282674b, has the ezid record, https://ezid.cdlib.org/id/doi:10.6075/J0P26W1T. This collection has a Methods note, but no Abstract.

lsitu commented 7 years ago

Thanks @hjsyoo.

hjsyoo commented 7 years ago

@lsitu Please note, though, that I added the Methods manually for this record, which is on prod. I should be able to mint a doi for a record on staging, if you prefer.

lsitu commented 7 years ago

@hjsyoo Please wait until @jhriv deploys it to staging. Thanks.

lsitu commented 7 years ago

@hjsyoo John just deployed it to staging and it's ready for test now. Thanks.

hjsyoo commented 7 years ago

@lsitu I'm testing the minting now. 6) One thing I've already noticed is that the Title is sent twice to EZID: https://ezid.cdlib.org/id/doi:10.5072/FK2QF8RM4J.

hjsyoo commented 7 years ago

@lsitu Here's another - 7) The formats in the EZID record (https://ezid.cdlib.org/id/doi:10.5072/FK2FX7B075) still don't show a clear, one-to-one correspondence with the actual file formats in the record, http://librarytest.ucsd.edu/dc/object/bb7920789g. I don't have a use case for formats right now, so if you think it's best to put it on a separate ticket, that would be fine with me.

lsitu commented 7 years ago

@hjsyoo For the title, I think this is just the metadata issue in the original rdf, which has a dup title: http://librarytest.ucsd.edu/dc/collection/bb5940732k/data. For the format, I think we just sending the formats from all the service files. We won't send any master source files. Does the rule look correct?

hjsyoo commented 7 years ago

@lsitu Interesting about the duped title. @arwenhutt, I noticed the same thing in CCDB on prod. Looks like it should be deduped? Regarding format, @arwenhutt, I'm not seeing a clear match between the file formats on a landing page (e.g., http://librarytest.ucsd.edu/dc/object/bb7920789g) and the formats that get pushed to EZID (https://ezid.cdlib.org/id/doi:10.5072/FK2FX7B075). There are other examples I can share if needed. Is this something I should just ignore, or is it a concern?

hjsyoo commented 7 years ago

@lsitu, Collections are still getting assigned a Resource type value of "Collection". I'd prefer "Dataset" to be the default value. https://ezid.cdlib.org/id/doi:10.5072/FK26D5WC9R

hjsyoo commented 7 years ago

I'm not sure if this issue has been worked on before deploy, but as an FYI, The Description[Methods] didn't make it into the EZID record: https://ezid.cdlib.org/id/doi:10.5072/FK2B56M63S.

lsitu commented 7 years ago

@hjsyoo It seems like there is a gap some where. I will ask @jhriv double check the deployment status and let you know if we are ready for more tests. Thanks.

hjsyoo commented 7 years ago

@lsitu Ok, the Subjects and Title Language are getting pushed properly. I have to wrap up testing for the day. Feel free to open a new ticket or roll this one over, whichever works best for you. Thanks for all your help with this.

lsitu commented 7 years ago

@hjsyoo I think all five new issues above should be addressed in my commit this morning. However something going wrong with the deployment earlier and @jhriv just redeploy it. Sorry about it. Could you test it again when you get a chance? Thanks.

hjsyoo commented 7 years ago

@lsitu, The EZID push looks great! Just two issues remain. 1) EZID has multiple Description types. When we push a Methods note, can it get mapped to Description [Methods] in EZID? In https://ezid.cdlib.org/id/doi:10.5072/FK2736T405, the first Description is an Abstract (and is correctly mapped in EZID), but the second Description is a Methods note. 2) The formats mapped for objects (e.g., https://ezid.cdlib.org/id/doi:10.5072/FK21J9F939) are still not human readable, or at least they're hard for this human to understand. Again, I think @arwenhutt can best answer the question of whether this issue needs to be addressed.

lsitu commented 7 years ago

@hjsyoo Thanks for testing it out. Yes we could do # 1. For # 2, I think we are listing all the formats from the services files at this time. If we want to change the mapping for this, Could we open a new ticket so that it won't delay the deployment for damsrepo?

hjsyoo commented 7 years ago

@lsitu Arwen is out sick, and I don't see this as a rush. Can we open the new ticket later, when she's back? It's fine to close this ticket in the meanwhile - let's not delay deployment.

lsitu commented 7 years ago

@hjsyoo It sounds good. I've corrected the Description type for the methods note to "Methods" and we are moving forward with the deployment now. @jhriv Could you deploy damsrepo release/4.33 https://github.com/ucsdlib/damsrepo/commits/release/4.33 to staging again? Thanks.

lsitu commented 7 years ago

@hjsyoo John had deployed it to staging and damsrepo release 4.33 with the description type change is ready on staging for review now.

hjsyoo commented 7 years ago

@lsitu It looks great! Should I close the ticket now, or wait until deploy to production?

lsitu commented 7 years ago

@hjsyoo Just feel free to close it at your convenience. Thanks.

hjsyoo commented 7 years ago

Thanks for all your help, @lsitu!

lsitu commented 7 years ago

Sure. I am glad that we work it out, @hjsyoo!