Closed pollecuttn closed 4 months ago
I suspect this is due to a change in case normalisation for label-derived identifiers. If the term was present before that change, it would bear the same case as the label. I'll try to find that change. there were odd discrepancies - e.g. where multi-word terms were sometimes title case and sometimes sentence case.
I suspect the solution is to make the link case-insensitive by making the query search in the lowercase_keyword subfield, rather than keyword.
The correct, end-goal fix to this is to make it all work with identifiers. But that is a massive change
Actually. I think the only case normalisation we do is for comparison, rather than storage. I suspect that when this term was first encountered, it was graphic medicine
, then someone fixed it to be Graphic medicine
, but because we have it stored as graphic medicine
it's not being overwritten with the correct capitalisation.
This is where the magic happens.
For performance reasons, we only insert each Concept once. If we find one that is already there, we discard it, rather than rely on ES's upsert behaviour.
Hurrah for quality commentary. Thank you me-from-the-past
This can be fixed quickly by me deleting the graphic medicine
concept(s), then notifying the Concepts pipeline of one of the corresponding Works. That should fix it.
I do have a fix to I can add to the concepts pipeline to make sure the most up-to-date labels are represented in Concepts, but I fear it might actually make this problem worse.
For example, k949jyjy has both Graphic Novels
and graphic novels
(both of which are av6wszv9). Allowing the Concept's label to be updated by this document would result in it nondeterministically being one or the other.
k949jyjy hasn't been catalogued yet. I expect the subject headings will be tidied up at that point but that doesn't help with this problem.
I think I'll fix this by messing with the record.
With the current code, the label is stable, if it is right, then it will continue to be right. If it is wrong, we can correct it at it will then continue to be right.
If I change the way this all works, then the labels can change willy-nilly and in unpredictable ways.
I fancy this might be the/a culprit. It has a Subject of graphic medicine | congresses
But then why are we seeing only a "works using" section, and not also a "works about" section?
I think that may be because about only uses the whole subject, and not subject concepts
Sorted by Database Surgery
Reported in Slack:
I think this might only be affecting the 'graphic medicine' concept as other concepts work as expected: lithographs from https://wellcomecollection.org/concepts/fmydsuw2 paintings from https://wellcomecollection.org/concepts/q9b2ep5v parodies from https://wellcomecollection.org/concepts/g3sa5kv4
Changing 'graphic' to 'Graphic' in the link from the concept page works - you get the relevant 90 works: https://wellcomecollection.org/search/works?genres.label=%22Graphic+medicine%2.
I've looked at Sierra and all those 90 records use 'Graphic' not 'graphic' in their 655s.
'graphic medicine' isn't in the type/technique dropdown when you search with no search term. Not sure if this is a red herring or not.
To do