ucsdlib / damspas

UC San Diego DAMS Hydra Head
Other
7 stars 5 forks source link

Implement new mapping and API for SHARE2 in DAMS4 #270

Closed hweng closed 6 years ago

hweng commented 7 years ago

This ticket will be visited after the following two tickets is finished: share_notify gem support for SHARE2: https://github.com/ucsdlib/damspas/issues/268, and share mapping spreadsheet updating: https://github.com/ucsdlib/damspas/issues/269

hweng commented 7 years ago

Updated configuration for SHARE V2 connections. Updated the gem file and the push methods to use share_notify V2 API. Is updating the mapping code for SHARE V2 fields.

jessicahilt commented 6 years ago

@hweng What's the status of this ticket?

hweng commented 6 years ago

@jessicahilt This one depends on another ticket "share mapping spreadsheet updating: #269". After https://github.com/ucsdlib/damspas/issues/269 is done, this work can be proceeded.

hweng commented 6 years ago

@hjsyoo @arwenhutt The DAMS4 to SHARE V2 mapping has been implemented in damspas. Here is the JSON output for example record bb7305352v: https://gist.github.com/hweng/81bbd50513895f09b9b93003e921a5a4

Here is the sample record that I pushed to SHARE staging area: https://staging-share.osf.io/dataset/4605A-898-E37

If you search title “An annotated checklist of the bees (Hymenoptera: Anthophila) of San Diego County, California”, You will see the first result “Published by UC San Diego Library, Digital Collections”: https://staging-share.osf.io/discover?q=An%20annotated%20checklist%20of%20the%20bees%20(Hymenoptera%3A%20Anthophila)%20of%20San%20Diego%20County%2C%20California&type[]=data%20set

Please let me know if the JSON output and the search result page looks good to you.

hweng commented 6 years ago

Here are some points summarized from https://github.com/ucsdlib/damspas/issues/269, let me know if they look correct to you:

  1. Could all RDC landing pages (whether it's from a collection or object page) be mapped to Resource Type = Dataset, rather than Creative Work? Yes.

  2. The following related_agents is added for DAMS records. {agent_type: "Contributor", type: "Organization", "name": "http://library.ucsd.edu/dc"} agent_type: "Publisher", type: "Organization", "name": "UC San Diego Library, Digital Collections"}

  3. Date published is mapped from date_json_tesim value for "type":"issued”. If there is no such value in a DAMS record, it use default value the current time.

  4. For the mapping from DAMS relationship_json_tesi role to SHARE related_agents:

When the a DAMS record relationship_json_tesi role doesn’t match any of the related_agents data type in SHARE schema, it is mapped to default value "contributor"

Here is the list of SHARE related_agents data type available: "related_agents": { "type": "array", "items": { "type": "object", "additionalProperties": false, "description": "Many-to-many relationship", "properties": { "@type": { "enum": [ "AGENTWORKRELATION", "AgentWorkRelation", "CONTRIBUTOR", "CREATOR", "Contributor", "Creator", "FUNDER", "Funder", "HOST", "Host", "PRINCIPALINVESTIGATOR", "PRINCIPALINVESTIGATORCONTACT", "PUBLISHER", "PrincipalInvestigator", "PrincipalInvestigatorContact", "Publisher", "agentworkrelation", "contributor", "creator", "funder", "host", "principalinvestigator", "principalinvestigatorcontact", "publisher" ] }

hjsyoo commented 6 years ago

@hweng Looking at the sample record and search results display, the Tags metadata are much better than what we see coming from DataCite. I have a few requests for revisions:

As a comment for future reference, we'll want to see what happens when a record pushed from DAMS gets merged with the matched record from DataCite MDS, on their system. Since SHARE harvests from DataCite and all of the RDCP collections get DOIs, the RDCP collection records will be duplicated in their system. This duplication is common in SHARE, which does a pretty good job of recognizing dupes. We just want to make sure it goes well.

The JSON questions (3 & 4) are best answered by @arwenhutt!

hweng commented 6 years ago

@hjsyoo For your following question about "Source" and "Publisher" - Yes, "Publisher" is fine, but there is no "Source" type in SHARE V2 schema, However you can pick any of the type from here: https://gist.github.com/hweng/5fd98a1777bbf806231672090c554a3d, Please let me know which one you would like to map to?

Detail please see https://staging-share.osf.io/api/v2/schema/

Can we assign the following fixed values: Source = "UC San Diego Library" (because the Library provides the metadata that SHARE harvests) Publisher = "UC San Diego Library Digital Collections" (without comma), because this is the repository's name.

hjsyoo commented 6 years ago

@hweng Right - by Source, I'm referring to the value displayed here: image SHARE calls this Source in the UI, but I'm not sure where it fits in the schema.

Here's a screenshot from SHARE on prod. These are records that we gave them during our work on tritonshare, but they aren't DAMS records. We asked for Source = "UC San Diego Library". image

hweng commented 6 years ago

@hjsyoo

  1. I agree the name:"http://library.ucsd.edu/dc" as a Contributor can be removed. I will update the mapping.

  2. ** Publisher = "UC San Diego Library Digital Collections" (without comma), because this is the repository's name. -- Yes I will update it.

  3. What you mean the "source" - "UC San Diego" is actually not pushed from our DAMS, but is assigned by COS side. I guess they figure out this source value from authentication. The same situation for "External Links", this is not from our pushed DAMS metadata Do you want me to contact with COS developer to clarify how they assign those value?

  4. "In search results, can Contributors be displayed?" If you look into our pushed DAMS record example https://gist.github.com/hweng/81bbd50513895f09b9b93003e921a5a4, all contributors were mapped exactly according to SHARE V2 schema, for example:

{:@id=>":8a5d0db88f2de861ef42f148edcdb513", :@type=>"Person", :name=>"Ascher, John"}, {:@id=>":716525838003ed2f4cc3928d416affca", :@type=>"contributor", :agent=>{:@id=>"_:8a5d0db88f2de861ef42f148edcdb513", :@type=>"Person"}, :creativework=>{:@id=>":93fc687c5bf3d43894b34a23ab38b23c", :@type=>"DataSet"}}

Not sure why it only shows up in their single item UI but not in their search result page UI. Since this is not our DAMS metadata mapping/pushing issue but the COS UI issue, do you want me to contact with their developer about it?

hjsyoo commented 6 years ago

@hweng Thanks for the fixes to 1 and 2. For 3, yes, please ask them about reassigning our records to Source="UC San Diego Library". Looking at their published list of sources here: https://share.osf.io/sources, there can be campuses, libraries, and repositories. My concern is twofold: 1) we've already given them records (not DAMS related) with "UC San Diego Library" as source. If we don't keep this consistent, we'll have records under two different names. 2) Also, there are other potential sources of metadata at UCSD. We should probably specify that we only represent the Library?

For 4, since the metadata is correct, don't worry about the COS UI issue. Thanks!

hweng commented 6 years ago

@hjsyoo The 1 and 2 has been updated. For the SHARE staging UI issues, I sent the message and screenshots to Lauren Barker via slack. And let's wait to see what she will response.

hjsyoo commented 6 years ago

@hweng I was told Rick Johnson is the best contact for addressing these remaining issues. I believe you can still reach him through our slack team, but if not, I can forward his email to you.

hweng commented 6 years ago

@hjsyoo I have Rick Johnson' email. And I will try to contact him about the remaining issues. Thank you.

hweng commented 6 years ago

@hjsyoo Regarding your questions:

1 The External Links were changed to ARK URL instead of a fixed link of http://library.ucsd.edu/dc:

https://staging-share.osf.io/dataset/4605A-898-E37, Only ARK URL can be displayed at this point.

2 The Answer from Lauren: "Only contributors of type "creator" are displayed on the search results and contributors of all types are displayed on the detail page. "

I checked that they are displaying on the detail page: https://staging-share.osf.io/dataset/4605A-898-E37

3 Source Name= "UC San Diego Library" has been fixed.

4 For Publisher = "UC San Diego Library Digital Collections" ,

I reported the bug to COS that both deleted value "UC San Diego Library, Digital Collections", and new value "UC San Diego Library Digital Collections" are displaying on their UI.
Here is the answer from Lauren: " When is_deleted is set to True the work is removed from search but not deleted from the database. Any existing information will persist when is_deleted is set to False on the work and it reappears in search. Unfortunately there isn't a way to delete relationships on a work via the API right now. We recognize that this is not ideal so if data needs to be cleaned up on production we would be happy to do that for you. "

5, The same situation as above, "http://library.ucsd.edu/dc" is deleted via SHARE API, but it is still persisted in their database. SHARE doesn't have API support for deleting from their database yet.

So for #4 and #5 The only way is to send a request to them to manually clean up data because there is no API support. And I've sent request to Lauren to clean it up on production and staging.