scientist-softserv / utk-hyku

Other
6 stars 0 forks source link

Update collection metadata #149

Open ShanaLMoore opened 1 year ago

ShanaLMoore commented 1 year ago

Make updates per spreadsheet changes. Re Verify that collection metadata matches the spreadsheet, since it has changed.

Note from client (Meredith):

Noting here that two changes were made on the spreadsheet. License was removed and a new URI was selected for the property for form (http://purl.org/dc/terms/format) due to an oversight on our main MAP for work types. These are in addition to Tuesday's changes of removing rights and adding note. I will not make any other changes after today.

Original ticket #126

ShanaLMoore commented 1 year ago

Q: Will there be consequences for removing Hyrax::BasicMetadata from the collection model? Both default hyrax && hyku includes it.

ShanaLMoore commented 1 year ago

timebox removing hyrax metadata - 1hr

otherwise just update property uri and add note property

DiemBTran commented 1 year ago

Needs further review:

tested on:


  1. There are 25 display labels on a new collection form, while the Digital Collections: Vendor Supplied MAP (DC MAP) only has 22 properties. The 3 extra properties I found on the new collection form are:
    1. location
    2. identifier
    3. language
    4. These 3 should be removed from the new collection form
  2. There were 2 fields on the DC MAP that I did not find on the new collection form word-for-word but I thought could be “close enough” matches for each other:
    1. the form has rights notes but the DC MAP has note
    2. the form has related URL but the DC MAP has collection link
    3. These are interchangeable, so either the new collection form or the DC MAP should be updated to reflect that sameness
ShanaLMoore commented 1 year ago
  1. Asked client for direction: https://assaydepot.slack.com/archives/C0396LSM06P/p1667338865418549
ShanaLMoore commented 1 year ago

Client changed collection_link to resource_link and wants to use notes instead of rights_notes to make it more generic.

ShanaLMoore commented 1 year ago

NOTE TO QA: cc @DiemBTran

resource_link will not be available as read only on the form, but the user should be able to set it via bulkrax: https://qa.utk-hyku-staging.notch8.cloud/dashboard/collections/678a5d13-64f9-4140-a68d-d2cdc011e29b/edit?locale=en

Updated sample file:

149-all-collection-metadata.csv

Image

ShanaLMoore commented 1 year ago

QA:

Clicking on Collections is causing a 500 error in staging. Looking into it!

ShanaLMoore commented 1 year ago

Pulling this ticket back to in dev to resolve this issue.

Note, this issue is present after importing a new metadata profile. However Collection isn't controlled be allinson flex so not sure how it could have affected it yet.

Image

ShanaLMoore commented 1 year ago

As suspected, this was caused by a invalid metadata profile. However I'm super surprised it affected the collection metadata. To correct this, the date fields needed multi_value: true

This now passes QA. A user is able to create collections and save metadata per the client's requirement, manually and via bulkrax.

However, clicking on contributor causes 500 error. This will be separated into its own ticket.

Image

ShanaLMoore commented 1 year ago

contributor link bug: https://github.com/scientist-softserv/utk-hyku/issues/193

mlhale7 commented 1 year ago

@ShanaLMoore - I haven't been able to reproduce the contributor bug. I wanted to clarify - if we approve #149, issue #193 will still be kept open as it has been separated out as it's own problem?

mlhale7 commented 1 year ago

@ShanaLMoore - thanks for this. Editing a collection in staging, one issue I noticed was that "Resource type" seems to be linked to set of terms that we will not be using for Digital Collections (Article, Dataset, etc.). Expected values come from this vocabulary - https://id.loc.gov/vocabulary/resourceTypes.html (e.g. "Text, "Still image", etc.). I'm assuming this might be because the property we selected (http://purl.org/dc/terms/type) is used out of the box for IR purposes or something?

I hadn't realized that "Total Items" was a thing for every collection. We don't technically need an extent field then, but we can just keep it and not populate it.

mlhale7 commented 1 year ago

Title and abstract also aren't expected to be arrays or multiple values. It's possible students will accidentally add additional values that we don't need since it's possible, but we can also live with this.

ShanaLMoore commented 1 year ago

Hi @mlhale7 I am so sorry I missed all of your comments until now!

Title and abstract are default hyrax metadata fields. When we redfine their data types, a bunch of fields in hyrax breaks. We can implement a validation to make sure there is only 1 element in the field; the form will refuse the submission if a user selects more. It should already be applied to title actually.

I can also create a ticket to clean up the form later. For example, remove any "add" button if a field should only have one. I understand it's a bad user experience to act like they can add more when they really can't.

I need a moment to look into resource type. I believe by default, resource type is hooked into hyrax's vocab: https://github.com/scientist-softserv/utk-hyku/blob/main/config/authorities/resource_types.yml

If that's the case, could you provide an updated file similar to how you all did for license?

If it's meant to be a remote vocabulary, that work will be completed as part of the Questioning Authority epic #263 For now, since ingests is the priority, I'd just want to make sure that the remote uri saves correctly when importing w bulkrax. A reindex after that epic is complete should replace it with the proper label.

I can also go ahead and remove the extent field, if that's preferred vs keeping it around.

ShanaLMoore commented 1 year ago

@ShanaLMoore - I haven't been able to reproduce the contributor bug. I wanted to clarify - if we approve #149, issue #193 will still be kept open as it has been separated out as it's own problem?

@mlhale7 Yes, that would be correct. Oftentimes when we find minor issues, we'll break it into its own ticket so that we can keep the majority of the work/feature moving forward. Bug tickets would remain open and treated as a separate issue.

mlhale7 commented 1 year ago

@ShanaLMoore - really the only critical problem here stopping approval would be the resource_type issue. The rest of the comments are "nice to haves." For resource_type for collections, we really just want to hard code in "Collection" as the dcterms:type - https://utk-mods-to-rdf.readthedocs.io/en/latest/contents/4_mapping.html#typeofresource-with-collection-yes For individual records we'd want to use other values in LoC's resourceTypes vocab. I've attached a yaml just for the value expected for collections for resource_type resource_type_collection.txt. The reason for this value is more for sharing elsewhere than display on Hyku. When we share collection records in Primo, we like to note that they are for collections rather than individual records.

ShanaLMoore commented 1 year ago

@mlhale7 I believe that most of your concerns will be resolved once we finish implementing the remote questioning authorities and update the form/ui portion. Instead of using the local authorities it will resolve the uri and save/display the label.

However, while investigating this ticket I discovered that the resource_type uri is not getting saved at all for work types or collections. So I will work on resolving that asap since that impacts ingests.

sample file: heilman_full_with_collections_short.csv

Note that resource_type in parsed_metadata is [] when it should be the uri from the spreadsheet:

image

ShanaLMoore commented 1 year ago

@mlhale7 I created a placeholder ticket for us to revisit the UI concerns: https://github.com/scientist-softserv/utk-hyku/issues/275

ShanaLMoore commented 1 year ago

@mlhale7 please reference this MR for the changes I made.

To reiterate, I believe all of your concerns will be addressed when we fix the UI and implement remote questioning authority.

To me, the most important part right now is to make sure the uri values gets saved to the metadata properties. Regarding this ticket, if the metadata properties are present as required, please consider passing this along as we will address functionality at a later time (post our ingest priority). Doing so will also unblock you all from doing additional ingests.

But if there are any additional questions or concerns, please let me know.

ShanaLMoore commented 1 year ago

Tested on staging and verified the value gets saved for image and collection: https://qa.utk-hyku-staging.notch8.cloud/importers/34?locale=en

mlhale7 commented 1 year ago

@ShanaLMoore - thanks for this. I'm seeing values I'd expect for resource_type on staging now and approve of the change to be deployed.

mlhale7 commented 1 year ago

This should be good to go @ShanaLMoore