wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
4 stars 0 forks source link

'All works' button on the graphic medicine concept page finds no results #780

Closed pollecuttn closed 4 months ago

pollecuttn commented 5 months ago

Reported in Slack:

If I'm in a concept: https://wellcomecollection.org/concepts/jzyrfxa5 then click to see all 90 works I get: https://wellcomecollection.org/search/works?genres.label=%22graphic+medicine%22 (which says there are no works)

I think this might only be affecting the 'graphic medicine' concept as other concepts work as expected: lithographs from https://wellcomecollection.org/concepts/fmydsuw2 paintings from https://wellcomecollection.org/concepts/q9b2ep5v parodies from https://wellcomecollection.org/concepts/g3sa5kv4

Changing 'graphic' to 'Graphic' in the link from the concept page works - you get the relevant 90 works: https://wellcomecollection.org/search/works?genres.label=%22Graphic+medicine%2.

I've looked at Sierra and all those 90 records use 'Graphic' not 'graphic' in their 655s.

'graphic medicine' isn't in the type/technique dropdown when you search with no search term. Not sure if this is a red herring or not.

To do

paul-butcher commented 4 months ago

I suspect this is due to a change in case normalisation for label-derived identifiers. If the term was present before that change, it would bear the same case as the label. I'll try to find that change. there were odd discrepancies - e.g. where multi-word terms were sometimes title case and sometimes sentence case.

paul-butcher commented 4 months ago

I suspect the solution is to make the link case-insensitive by making the query search in the lowercase_keyword subfield, rather than keyword.

paul-butcher commented 4 months ago

The correct, end-goal fix to this is to make it all work with identifiers. But that is a massive change

paul-butcher commented 4 months ago

Actually. I think the only case normalisation we do is for comparison, rather than storage. I suspect that when this term was first encountered, it was graphic medicine, then someone fixed it to be Graphic medicine, but because we have it stored as graphic medicine it's not being overwritten with the correct capitalisation.

paul-butcher commented 4 months ago

This is where the magic happens.

For performance reasons, we only insert each Concept once. If we find one that is already there, we discard it, rather than rely on ES's upsert behaviour.

paul-butcher commented 4 months ago

Hurrah for quality commentary. Thank you me-from-the-past

paul-butcher commented 4 months ago

This can be fixed quickly by me deleting the graphic medicine concept(s), then notifying the Concepts pipeline of one of the corresponding Works. That should fix it.

I do have a fix to I can add to the concepts pipeline to make sure the most up-to-date labels are represented in Concepts, but I fear it might actually make this problem worse.

For example, k949jyjy has both Graphic Novels and graphic novels (both of which are av6wszv9). Allowing the Concept's label to be updated by this document would result in it nondeterministically being one or the other.

pollecuttn commented 4 months ago

k949jyjy hasn't been catalogued yet. I expect the subject headings will be tidied up at that point but that doesn't help with this problem.

paul-butcher commented 4 months ago

I think I'll fix this by messing with the record.

With the current code, the label is stable, if it is right, then it will continue to be right. If it is wrong, we can correct it at it will then continue to be right.

If I change the way this all works, then the labels can change willy-nilly and in unpredictable ways.

paul-butcher commented 4 months ago

I fancy this might be the/a culprit. It has a Subject of graphic medicine | congresses

But then why are we seeing only a "works using" section, and not also a "works about" section?

paul-butcher commented 4 months ago

I think that may be because about only uses the whole subject, and not subject concepts

paul-butcher commented 4 months ago

Sorted by Database Surgery