scientist-softserv / utk-hyku

Other
6 stars 0 forks source link

URI string values are present in Solr record / facets where applicable #592

Open jillpe opened 7 months ago

jillpe commented 7 months ago

test URI: https://id.loc.gov/authorities/names/n2017180154 get JSON back by appending .json

ensure @id is equal to the URI Image

This is the value we're looking for Image

Testing Instructions

jillpe commented 6 months ago

SoftServ QA: ✅

Before:

Image

After:

Image

josh-morgan117 commented 6 months ago

Looks like we've got an issue with URI values with the language. It's the right URI but when it's being differenced, it's not choosing the correct pref label. It's pulling the French one instead of the English.

kirkkwang commented 6 months ago

@kidon0011 ah dang, can you share that URI?

kirkkwang commented 6 months ago

@kidon0011 Mark and I worked on a solution for this so it should be going to staging soon

josh-morgan117 commented 6 months ago

@kirkkwang Thanks, Kirk. Sorry about the lack of response, these notifications don't come to my email.

markpbaggett commented 5 months ago

@kirkwang -- should this work on any id.loc.gov value or only names? Here is an example of one with many URIs from id.loc.gov:

https://dc.utk-hyku-production.notch8.cloud/concern/audios/56c4b6df-7599-4085-a774-8e6a5409f170?locale=en

Also, ignore those other URIs. We will clean those up after this.

kirkkwang commented 5 months ago

@markpbaggett This should work... but perhaps we need to make it a little more bulletproof. this is what i'm noticing

irb(main):007:0> indexer.uri_to_value_for('Http://id.loc.gov/authorities/subjects/sh85146723')
=> "Http://id.loc.gov/authorities/subjects/sh85146723"
irb(main):008:0> indexer.uri_to_value_for('http://id.loc.gov/authorities/subjects/sh85146723')
=> "Wildfires"

The capital H is messing it up. We should be able to solve this by applying a #downcase to the value in this method https://github.com/scientist-softserv/utk-hyku/blob/main/app/indexers/uri_to_string_behavior.rb#L18

markpbaggett commented 5 months ago

@kirkkwang, interesting. Do you know why those are even getting to be an H? If you look at the attached import sheets, they all come over with a little h.

jillpe commented 4 months ago

This should be ready to test again on staging

QA: ✅

Work

Language URI: http://id.loc.gov/vocabulary/iso639-2/eng

Image

josh-morgan117 commented 4 months ago

Image

@kirkkwang Should the resource type URI resolve as well?

kirkkwang commented 4 months ago

@josh-morgan117 the way it works is that any term with range: http://www.w3.org/2001/XMLSchema#anyURI should change the URI, but the resource type here does not

https://github.com/utkdigitalinitiatives/m3_profiles/blob/main/maps/utk.yml#L3345-L3359

josh-morgan117 commented 4 months ago

@kirkkwang I think the example I'm looking at is using resource_type (not _local), which does have range: http://www.w3.org/2001/XMLSchema#anyURI :

https://github.com/utkdigitalinitiatives/m3_profiles/blob/50e5127a8d9fe46e9bcd49bd84e93d105bced9d7/maps/utk.yml#L3308-L3344

mlhale7 commented 4 months ago

@kirkkwang - I wanted to clarify the scope of this ticket. Should all URI values be able to be transformed to strings (or only certain vocabs that are established with Questioning authority?). I'm noticing that we're sharing the URI for rights statements on staging and not strings.

Also, on staging all the metadata associated with collections still has URIs (with capitals "H"s started them). Is this something to be addressed in the future or something we should clean up on our end?

kirkkwang commented 4 months ago

@josh-morgan117 ah I see that. The resource_type should dereference the URI, locally it does so something seems different on staging

image

@mlhale7 If I recall, what I did for this ticket was make all the id.loc.gov URI's be dereferenced. Am I to understand that all URI's no matter the domain should be dereferenced?

mlhale7 commented 4 months ago

@kirkkwang - I was honestly trying to make sure I understood the scope of the ticket to feel comfortable signing off. If it's just id.loc.gov that's great. We do need to figure out what we're doing with URIs that come from id.loc.gov in collection metadata (if that's just UTK cleaning it up we'll make it happen), but I wanted to confirm that also. I realize the collection and item metadata is managed differently.

kirkkwang commented 4 months ago

@mlhale7 ah sorry i didn't address the collection part, so there was a PR here that should fix that issue with the capital H. I believe cleaning it up should fix it now because prior to that commit, all subjects were being capitalized.

The scope of this ticket from what I understood was to account for id.loc.gov URIs.

kirkkwang commented 4 months ago

@mlhale7 @josh-morgan117 actually i think i found what's going on, i'll work on a PR soon!

kirkkwang commented 4 months ago

@mlhale7 ultimately should the rights statements also be dereferenced? if it's a yes then we can just add it to this ticket i feel.

mlhale7 commented 4 months ago

@kirkkwang - I'll get feedback from UTK and get back to you. We don't want to draw attention away from other critical work and the rights URI is much more useable than the other URIs.

A quick question, I think the string value would make more sense to read for users than a link, but the linked content in the URI is important. If we go with a string value is there any way to hyperlink to the URI from the text? If not (or if that's a bit of work), we can keep it as a URI.

mlhale7 commented 4 months ago

@kirkkwang - it sounds like our ideal solution would be to have a badge that links out (as is done in DPLA, e.g. here near the top) for rights statements. Given this, should we table this ask for now and continue with the scope of this ticket being LoC?

kirkkwang commented 4 months ago

@mlhale7 thanks for the example, if that's the case then we probably will need to handle that in another ticket, in this ticket for the meant time i turned it into dereferenced links

image
mlhale7 commented 4 months ago

@kirkkwang - Thanks for this. I think that's a great improvement.

josh-morgan117 commented 4 months ago

@kirkkwang I wasn't seeing the resource type dereference earlier this morning but I see it now. All of this looks good to me. I'll suggest @markpbaggett take a final look before moving this card.

josh-morgan117 commented 2 months ago

Hey @kirkkwang We're still seeing LOC URIs not being dereferenced, such as in search facets, collections, and language (on this one https://digitalcollections.lib.utk.edu/concern/audios/fd22951d-e484-4f40-999d-ec9c5d2b416f).

kirkkwang commented 2 months ago

@josh-morgan117 i'll check if this is an indexing issue, i'll try and save the work and see if it updates

kirkkwang commented 2 months ago

@josh-morgan117 that seemed to do the trick, i think we'd want to schedule in a reindex of all the works at some point to fix this across the board

josh-morgan117 commented 2 months ago

@kirkkwang We're not currently importing due to an issue @orangewolf is working on. I wonder if it would make sense to do that now?

kirkkwang commented 2 months ago

@josh-morgan117 Rob advised against a site wide reindex until he gets back, but I would happily do any spot check reindexing if you come across an object that needs it

josh-morgan117 commented 2 months ago

@kirkkwang I'm still seeing some LOC ones with capital Hs in http (the ones I've encountered so far are set to private visibility). Will the reindex address that?

kirkkwang commented 2 months ago

@josh-morgan117 it's been a while but I wanna say yes, do you have one we can try?

josh-morgan117 commented 2 months ago

This one , editing and saving doesn't fix it. I think I would need to manually change the H in the metadata. It looks like it appears on all of the items with the resource type still showing as a URI, except when you edit and save, that resource type resolves to a string but the capital H for the subjects is still there.

kirkkwang commented 2 months ago

@josh-morgan117 this one is a bit of an annoying one it seems, I changed the Http to an http on the subject as well, it resolves now. The reason why it's annoying is because this means the object itself saved with a capitol H on import. I'm not certain but it seems the same change I did for Resource Type would need to be done for Subject or any other controlled vocabulary field that would use a URI