Closed jacobthill closed 4 years ago
Obviously this doesn't appear in stage at the moment, but I strongly suspect it's caused by bad data being indexed (using an empty string, or only whitespace or something..) and the indexer should guard against it.
We could do something in the app itself, and I'd defer to @ggeisler on what that should be (and.. it might be tricky to differentiate the different flavors of empty string vs no value provided vs etc etc).
We're saying that we might receive records that have empty or whitespace values for Language, but we are indexing those values rather than considering them null for Language? (I think this is what @cbeer is saying in his first paragraph above.)
If we have to index those cases, my first thought is to group them into a "Unspecified" value. I guess we could cover the obvious cases and put them into that bucket, but am not sure how easy it would be to detect all possible cases that should go into the "Unspecified" bucket (without having a whitelist of valid languages we test against, and anything that doesn't match is "Unspecified").
But from the UI point of view, if we have to show facet values that are not an actual language, it seems preferable to lump them all into one single value with a label like "Unspecified" so the user knows those records have not been deliberately cataloged into a valid language, while also not displaying a blank value with a count in the facet selection box.
@jacobthill -- was there any intention behind having a blank language? We need to know this please, before we can proceed.
No I'm not sure why they are blank. I could look into the records and configs once the data is loaded back in stage.
FYI every language value is sent to a series of translation maps and if the values isn't found in any of the maps and error should be raised. I would assume a blank value would raise that error as it wouldn't be found in any translation map. I would need to look into it to be certain.
@jacobthill thanks -- then we are waiting on you looking into this before this ticket can be worked
This is blocked by https://github.com/sul-dlss/dlme/issues/630
Confirmed at standup on 11/5 this isn't currently blocked.
This is likely a mapping error but I won't be able to confirm that until I finish the mapping work. I will assign this ticket to myself until I can confirm one way or the other.
This was a mapping issue and is now resolved
Records with no language value are passed in as an empty string and displayed as a count in the 'Language' facet. Go to stage => 'Language' and sort numerically. It is currently the 6th value from the top. It doesn't seem to appear in prod, perhaps because there are no records with empty strings as language values in prod.