Open jillpe opened 7 months ago
SoftServ QA: ✅
Before:
After:
Looks like we've got an issue with URI values with the language. It's the right URI but when it's being differenced, it's not choosing the correct pref label. It's pulling the French one instead of the English.
@kidon0011 ah dang, can you share that URI?
@kidon0011 Mark and I worked on a solution for this so it should be going to staging soon
@kirkkwang Thanks, Kirk. Sorry about the lack of response, these notifications don't come to my email.
@kirkwang -- should this work on any id.loc.gov
value or only names? Here is an example of one with many URIs from id.loc.gov
:
Also, ignore those other URIs. We will clean those up after this.
@markpbaggett This should work... but perhaps we need to make it a little more bulletproof. this is what i'm noticing
irb(main):007:0> indexer.uri_to_value_for('Http://id.loc.gov/authorities/subjects/sh85146723')
=> "Http://id.loc.gov/authorities/subjects/sh85146723"
irb(main):008:0> indexer.uri_to_value_for('http://id.loc.gov/authorities/subjects/sh85146723')
=> "Wildfires"
The capital H
is messing it up. We should be able to solve this by applying a #downcase
to the value in this method
https://github.com/scientist-softserv/utk-hyku/blob/main/app/indexers/uri_to_string_behavior.rb#L18
@kirkkwang, interesting. Do you know why those are even getting to be an H
? If you look at the attached import sheets, they all come over with a little h
.
This should be ready to test again on staging
QA: ✅
Language URI: http://id.loc.gov/vocabulary/iso639-2/eng
@kirkkwang Should the resource type URI resolve as well?
@josh-morgan117 the way it works is that any term with range: http://www.w3.org/2001/XMLSchema#anyURI
should change the URI, but the resource type here does not
https://github.com/utkdigitalinitiatives/m3_profiles/blob/main/maps/utk.yml#L3345-L3359
@kirkkwang I think the example I'm looking at is using resource_type (not _local), which does have range: http://www.w3.org/2001/XMLSchema#anyURI :
@kirkkwang - I wanted to clarify the scope of this ticket. Should all URI values be able to be transformed to strings (or only certain vocabs that are established with Questioning authority?). I'm noticing that we're sharing the URI for rights statements on staging and not strings.
Also, on staging all the metadata associated with collections still has URIs (with capitals "H"s started them). Is this something to be addressed in the future or something we should clean up on our end?
@josh-morgan117 ah I see that. The resource_type
should dereference the URI, locally it does so something seems different on staging
@mlhale7 If I recall, what I did for this ticket was make all the id.loc.gov URI's be dereferenced. Am I to understand that all URI's no matter the domain should be dereferenced?
@kirkkwang - I was honestly trying to make sure I understood the scope of the ticket to feel comfortable signing off. If it's just id.loc.gov that's great. We do need to figure out what we're doing with URIs that come from id.loc.gov in collection metadata (if that's just UTK cleaning it up we'll make it happen), but I wanted to confirm that also. I realize the collection and item metadata is managed differently.
@mlhale7 ah sorry i didn't address the collection part, so there was a PR here that should fix that issue with the capital H
. I believe cleaning it up should fix it now because prior to that commit, all subjects were being capitalized.
The scope of this ticket from what I understood was to account for id.loc.gov URIs.
@mlhale7 @josh-morgan117 actually i think i found what's going on, i'll work on a PR soon!
@mlhale7 ultimately should the rights statements also be dereferenced? if it's a yes then we can just add it to this ticket i feel.
@kirkkwang - I'll get feedback from UTK and get back to you. We don't want to draw attention away from other critical work and the rights URI is much more useable than the other URIs.
A quick question, I think the string value would make more sense to read for users than a link, but the linked content in the URI is important. If we go with a string value is there any way to hyperlink to the URI from the text? If not (or if that's a bit of work), we can keep it as a URI.
@kirkkwang - it sounds like our ideal solution would be to have a badge that links out (as is done in DPLA, e.g. here near the top) for rights statements. Given this, should we table this ask for now and continue with the scope of this ticket being LoC?
@mlhale7 thanks for the example, if that's the case then we probably will need to handle that in another ticket, in this ticket for the meant time i turned it into dereferenced links
@kirkkwang - Thanks for this. I think that's a great improvement.
@kirkkwang I wasn't seeing the resource type dereference earlier this morning but I see it now. All of this looks good to me. I'll suggest @markpbaggett take a final look before moving this card.
Hey @kirkkwang We're still seeing LOC URIs not being dereferenced, such as in search facets, collections, and language (on this one https://digitalcollections.lib.utk.edu/concern/audios/fd22951d-e484-4f40-999d-ec9c5d2b416f).
@josh-morgan117 i'll check if this is an indexing issue, i'll try and save the work and see if it updates
@josh-morgan117 that seemed to do the trick, i think we'd want to schedule in a reindex of all the works at some point to fix this across the board
@kirkkwang We're not currently importing due to an issue @orangewolf is working on. I wonder if it would make sense to do that now?
@josh-morgan117 Rob advised against a site wide reindex until he gets back, but I would happily do any spot check reindexing if you come across an object that needs it
@kirkkwang I'm still seeing some LOC ones with capital Hs in http (the ones I've encountered so far are set to private visibility). Will the reindex address that?
@josh-morgan117 it's been a while but I wanna say yes, do you have one we can try?
This one , editing and saving doesn't fix it. I think I would need to manually change the H in the metadata. It looks like it appears on all of the items with the resource type still showing as a URI, except when you edit and save, that resource type resolves to a string but the capital H for the subjects is still there.
@josh-morgan117 this one is a bit of an annoying one it seems, I changed the Http to an http on the subject as well, it resolves now. The reason why it's annoying is because this means the object itself saved with a capitol H on import. I'm not certain but it seems the same change I did for Resource Type would need to be done for Subject or any other controlled vocabulary field that would use a URI
test URI: https://id.loc.gov/authorities/names/n2017180154 get JSON back by appending .json
ensure @id is equal to the URI
This is the value we're looking for
Testing Instructions