sul-dlss / dlme

Digital Library of the Middle East web application, based on Spotlight
https://dlmenetwork.org/
Other
20 stars 2 forks source link

Bug in language facet display #539

Closed jacobthill closed 4 years ago

jacobthill commented 5 years ago

Records with no language value are passed in as an empty string and displayed as a count in the 'Language' facet. Go to stage => 'Language' and sort numerically. It is currently the 6th value from the top. It doesn't seem to appear in prod, perhaps because there are no records with empty strings as language values in prod.

cbeer commented 5 years ago

Obviously this doesn't appear in stage at the moment, but I strongly suspect it's caused by bad data being indexed (using an empty string, or only whitespace or something..) and the indexer should guard against it.

We could do something in the app itself, and I'd defer to @ggeisler on what that should be (and.. it might be tricky to differentiate the different flavors of empty string vs no value provided vs etc etc).

ggeisler commented 5 years ago

We're saying that we might receive records that have empty or whitespace values for Language, but we are indexing those values rather than considering them null for Language? (I think this is what @cbeer is saying in his first paragraph above.)

If we have to index those cases, my first thought is to group them into a "Unspecified" value. I guess we could cover the obvious cases and put them into that bucket, but am not sure how easy it would be to detect all possible cases that should go into the "Unspecified" bucket (without having a whitelist of valid languages we test against, and anything that doesn't match is "Unspecified").

But from the UI point of view, if we have to show facet values that are not an actual language, it seems preferable to lump them all into one single value with a label like "Unspecified" so the user knows those records have not been deliberately cataloged into a valid language, while also not displaying a blank value with a count in the facet selection box.

caaster commented 5 years ago

@jacobthill -- was there any intention behind having a blank language? We need to know this please, before we can proceed.

jacobthill commented 5 years ago

No I'm not sure why they are blank. I could look into the records and configs once the data is loaded back in stage.

jacobthill commented 5 years ago

FYI every language value is sent to a series of translation maps and if the values isn't found in any of the maps and error should be raised. I would assume a blank value would raise that error as it wouldn't be found in any translation map. I would need to look into it to be certain.

caaster commented 5 years ago

@jacobthill thanks -- then we are waiting on you looking into this before this ticket can be worked

jacobthill commented 5 years ago

This is blocked by https://github.com/sul-dlss/dlme/issues/630

cbeer commented 5 years ago

Confirmed at standup on 11/5 this isn't currently blocked.

jacobthill commented 5 years ago

This is likely a mapping error but I won't be able to confirm that until I finish the mapping work. I will assign this ticket to myself until I can confirm one way or the other.

jacobthill commented 4 years ago

This was a mapping issue and is now resolved