Comparison view treats multi-word value as multiple tokens

arildm commented 2 weeks ago

With the Svenska partiprogram och valmanifest (vivill) corpus selected, save two searches for comparison
Compare the searches using the parti attribute
Click any of the multi-word party names, e.g. Folkpartiet liberalerna
Expected: Some results
Actual: No results

Apparently, the API request has cqp2=[_.text_party_name = "Folkpartiet"] [_.text_party_name = "liberalerna"]

arildm commented 2 weeks ago

The backend /loglike response doesn't distinguish a multi-word value from multiple tokens. Compare these calls:

"han"+verb vs. "hon"+verb by sense: Space in string separates tokens

{ "loglike": {
  "hon..1:-1.000 vara..1:-1.000": 2375.04,
  "han..1:-1.000 vara..1:-1.000": -1774.16,
  "hon..1:-1.000 skola..4:-1.000": 1062.87,
  // ...

"frihet" vs. "jämlikhet" by party: Space in string does not separate tokens

{ "loglike": {
  "Feministiskt initiativ": 78.12,
  "V\u00e4nsterpartiet": 74.7,
  "Moderaterna": -73.75,
  // ...

Perhaps we can interpret the string value as one or more tokens depending on the input queries (set1_cqp and set2_cqp)? But changing the response format would probably be a more robust approach.

arildm commented 1 week ago

This is where the string in the reponse is whitespace-separated: https://github.com/spraakbanken/korp-frontend/blob/38534b82a7902cc5e56a67844485505ffe0f767e/app/scripts/services/backend.ts#L125

spraakbanken / korp-frontend

Comparison view treats multi-word value as multiple tokens #385