Closed hweng closed 6 years ago
Updated configuration for SHARE V2 connections. Updated the gem file and the push methods to use share_notify V2 API. Is updating the mapping code for SHARE V2 fields.
@hweng What's the status of this ticket?
@jessicahilt This one depends on another ticket "share mapping spreadsheet updating: #269". After https://github.com/ucsdlib/damspas/issues/269 is done, this work can be proceeded.
@hjsyoo @arwenhutt The DAMS4 to SHARE V2 mapping has been implemented in damspas. Here is the JSON output for example record bb7305352v: https://gist.github.com/hweng/81bbd50513895f09b9b93003e921a5a4
Here is the sample record that I pushed to SHARE staging area: https://staging-share.osf.io/dataset/4605A-898-E37
If you search title “An annotated checklist of the bees (Hymenoptera: Anthophila) of San Diego County, California”, You will see the first result “Published by UC San Diego Library, Digital Collections”: https://staging-share.osf.io/discover?q=An%20annotated%20checklist%20of%20the%20bees%20(Hymenoptera%3A%20Anthophila)%20of%20San%20Diego%20County%2C%20California&type[]=data%20set
Please let me know if the JSON output and the search result page looks good to you.
Here are some points summarized from https://github.com/ucsdlib/damspas/issues/269, let me know if they look correct to you:
Could all RDC landing pages (whether it's from a collection or object page) be mapped to Resource Type = Dataset, rather than Creative Work? Yes.
The following related_agents is added for DAMS records. {agent_type: "Contributor", type: "Organization", "name": "http://library.ucsd.edu/dc"} agent_type: "Publisher", type: "Organization", "name": "UC San Diego Library, Digital Collections"}
Date published is mapped from date_json_tesim value for "type":"issued”. If there is no such value in a DAMS record, it use default value the current time.
For the mapping from DAMS relationship_json_tesi role to SHARE related_agents:
When the a DAMS record relationship_json_tesi role doesn’t match any of the related_agents data type in SHARE schema, it is mapped to default value "contributor"
Here is the list of SHARE related_agents data type available: "related_agents": { "type": "array", "items": { "type": "object", "additionalProperties": false, "description": "Many-to-many relationship", "properties": { "@type": { "enum": [ "AGENTWORKRELATION", "AgentWorkRelation", "CONTRIBUTOR", "CREATOR", "Contributor", "Creator", "FUNDER", "Funder", "HOST", "Host", "PRINCIPALINVESTIGATOR", "PRINCIPALINVESTIGATORCONTACT", "PUBLISHER", "PrincipalInvestigator", "PrincipalInvestigatorContact", "Publisher", "agentworkrelation", "contributor", "creator", "funder", "host", "principalinvestigator", "principalinvestigatorcontact", "publisher" ] }
@hweng Looking at the sample record and search results display, the Tags metadata are much better than what we see coming from DataCite. I have a few requests for revisions:
As a comment for future reference, we'll want to see what happens when a record pushed from DAMS gets merged with the matched record from DataCite MDS, on their system. Since SHARE harvests from DataCite and all of the RDCP collections get DOIs, the RDCP collection records will be duplicated in their system. This duplication is common in SHARE, which does a pretty good job of recognizing dupes. We just want to make sure it goes well.
The JSON questions (3 & 4) are best answered by @arwenhutt!
@hjsyoo For your following question about "Source" and "Publisher" - Yes, "Publisher" is fine, but there is no "Source" type in SHARE V2 schema, However you can pick any of the type from here: https://gist.github.com/hweng/5fd98a1777bbf806231672090c554a3d, Please let me know which one you would like to map to?
Detail please see https://staging-share.osf.io/api/v2/schema/
Can we assign the following fixed values: Source = "UC San Diego Library" (because the Library provides the metadata that SHARE harvests) Publisher = "UC San Diego Library Digital Collections" (without comma), because this is the repository's name.
@hweng Right - by Source, I'm referring to the value displayed here: SHARE calls this Source in the UI, but I'm not sure where it fits in the schema.
Here's a screenshot from SHARE on prod. These are records that we gave them during our work on tritonshare, but they aren't DAMS records. We asked for Source = "UC San Diego Library".
@hjsyoo
I agree the name:"http://library.ucsd.edu/dc" as a Contributor can be removed. I will update the mapping.
** Publisher = "UC San Diego Library Digital Collections" (without comma), because this is the repository's name. -- Yes I will update it.
What you mean the "source" - "UC San Diego" is actually not pushed from our DAMS, but is assigned by COS side. I guess they figure out this source value from authentication. The same situation for "External Links", this is not from our pushed DAMS metadata Do you want me to contact with COS developer to clarify how they assign those value?
"In search results, can Contributors be displayed?" If you look into our pushed DAMS record example https://gist.github.com/hweng/81bbd50513895f09b9b93003e921a5a4, all contributors were mapped exactly according to SHARE V2 schema, for example:
{:@id=>":8a5d0db88f2de861ef42f148edcdb513", :@type=>"Person", :name=>"Ascher, John"}, {:@id=>":716525838003ed2f4cc3928d416affca", :@type=>"contributor", :agent=>{:@id=>"_:8a5d0db88f2de861ef42f148edcdb513", :@type=>"Person"}, :creativework=>{:@id=>":93fc687c5bf3d43894b34a23ab38b23c", :@type=>"DataSet"}}
Not sure why it only shows up in their single item UI but not in their search result page UI. Since this is not our DAMS metadata mapping/pushing issue but the COS UI issue, do you want me to contact with their developer about it?
@hweng Thanks for the fixes to 1 and 2. For 3, yes, please ask them about reassigning our records to Source="UC San Diego Library". Looking at their published list of sources here: https://share.osf.io/sources, there can be campuses, libraries, and repositories. My concern is twofold: 1) we've already given them records (not DAMS related) with "UC San Diego Library" as source. If we don't keep this consistent, we'll have records under two different names. 2) Also, there are other potential sources of metadata at UCSD. We should probably specify that we only represent the Library?
For 4, since the metadata is correct, don't worry about the COS UI issue. Thanks!
@hjsyoo The 1 and 2 has been updated. For the SHARE staging UI issues, I sent the message and screenshots to Lauren Barker via slack. And let's wait to see what she will response.
@hweng I was told Rick Johnson is the best contact for addressing these remaining issues. I believe you can still reach him through our slack team, but if not, I can forward his email to you.
@hjsyoo I have Rick Johnson' email. And I will try to contact him about the remaining issues. Thank you.
@hjsyoo Regarding your questions:
https://staging-share.osf.io/dataset/4605A-898-E37, Only ARK URL can be displayed at this point.
I checked that they are displaying on the detail page: https://staging-share.osf.io/dataset/4605A-898-E37
I reported the bug to COS that both deleted value "UC San Diego Library, Digital Collections", and new value "UC San Diego Library Digital Collections" are displaying on their UI.
Here is the answer from Lauren:
" When is_deleted
is set to True the work is removed from search but not deleted from the database. Any existing information will persist when is_deleted
is set to False on the work and it reappears in search. Unfortunately there isn't a way to delete relationships on a work via the API right now. We recognize that this is not ideal so if data needs to be cleaned up on production we would be happy to do that for you. "
So for #4 and #5 The only way is to send a request to them to manually clean up data because there is no API support. And I've sent request to Lauren to clean it up on production and staging.
This ticket will be visited after the following two tickets is finished: share_notify gem support for SHARE2: https://github.com/ucsdlib/damspas/issues/268, and share mapping spreadsheet updating: https://github.com/ucsdlib/damspas/issues/269