splattater / testTickets

0 stars 0 forks source link

Search filter by language not working #4

Open splattater opened 6 years ago

splattater commented 6 years ago

Sadly when I search for decks and slides with the words 'Creative Commons' and choose the English language I still see some Greek entries. If I choose French or Lithuanian it says 'An error occured (needs two 'r's in the spelling of the word) while fetching results' . If I choose Dutch I get nothing but hey that is OK it tells me 'no results found for the specified input parameters (perhaps that could be put into easy read English!) - all the other languages seem to have accurate results.

[SWIK-2394] created by ead

splattater commented 6 years ago

This is on slidewiki.org

by ajames

splattater commented 6 years ago

But i have replicated it on testing with the new latest language codes https://testing.slidewiki.org/search/keywords=creative%20commons&language=en&sort=score

One of the results - https://testing.slidewiki.org/deck/9846-1/3d-printing/slide/42828-1?language=en - is clearly greek and on a Greek deck but is showing up in English language results

Kostis Pristouris FYI

by ajames

splattater commented 6 years ago

The French/Lithuanian issue is news to me, seems a new bug as those languages where added last release.

The filter works otherwise, but the language handling with slides is not correct in our database. This was something I realised while working on the translated decks feature. The problem is related to this old bug that we found was closed before it was fixed: https://slidewiki.atlassian.net/browse/GH-1137. All slides are created with the correct language (based on deck), but each update makes the slides having the same language code "EN" (for english).

Notice the Greek results in the report are all slide results, not deck results, decks are properly assigned the language and filtered out.

by kprist

splattater commented 6 years ago

The issue is the same on testing due to the language data having the same issue.

by kprist

splattater commented 6 years ago

Kostis Pristouris what do we have to do in order to fix the issue with the standard 'EN' language value?
Reopening and doing GH-1137 Closed anyway but also enhancing the bin/slidewiki-data truncatelang script or also change code in deckservice?

by tboonx

splattater commented 6 years ago

GH-1137 Closed is fine, we fixed that already. The truncatelang command does not "fix" these issues as they are not exactly invalid according to the model we use, i.e. it is generally allowed. The reason it is allowed is mainly legacy stuff, as new decks should not have this issue.

We could
1) script another command that checks this i.e. compares the deck language to the slide language and update all slides with the deck language. We should first review the status in the data to see if there is any situation we haven't considered yet.
2) alternatively, we could include tools so that users could update the language of their decks and slides. This is related to https://slidewiki.atlassian.net/browse/GH-2107 in some sense, as we currently do not have a way for users to correct a mistake in the language specification without triggering a translation. This second point needs some more discussion.

by kprist

splattater commented 6 years ago

Thus route POST /slide/new in deckservice does not use the language attribute given and instead uses the decks language value?

A new script could do that. But yes we have to review the data first. I already started reviewing the data on testing but will do it on sw.org. But in general we have to force something and are not able to review each deck and their slides in order to detect the correct language for the deck/slides.

by tboonx

splattater commented 6 years ago

I am moving this ticket into JIRA as it is not getting technical!

Was "SWAQ-938"

by ajames

splattater commented 6 years ago

Route POST /slide/new is not used currently. In order to add a new slide we use the /decktree/node/create route. That route currently does not include the language as parameter, so, yes, the new slide picks the language from the parent deck. However I've added some new parameters there, including the language parameter, because I needed to be able to support deck copying via the slidewiki-cli tool and pass the language as it is in the original deck. But that option is also not used currently in platform.

by kprist

splattater commented 6 years ago

OK I had a look at the data from sw.org and my oberservations are the following:
1) There are decks and slides with language=_
2) There are only two decks in which the language attribute was changed between revisions
3) There are 3878 decks in which the slides languages are different (all or part of the slides) (havent checked deck and subdeck relation)
4) Slides could have language=null or language=undefined
5) Slides language values of a deck are up to three distinct ones. Most of the time its "en_GB","EN"

Kostis Pristouris Seems like we need a script which also handles _, null and undefined as language values. For slides we could infers their correct values from the deck but for decks we have to set a standard value like en.

by tboonx

splattater commented 6 years ago

These all are already handled with the `truncatelang` command in https://github.com/slidewiki/slidewiki-data-utils. If you really want to see what's going on, please take a dump from sw.org, restore it locally, apply the command, and then you can check about language differences between slides and their parent decks, which is all this issue is about. Check also the code for the truncatelang command to see some mongodb queries than might help.

by kprist

splattater commented 6 years ago

Right truncatelang fixes many issues. I executed it with the latest dump from stable and played a little bit with the data. Here are examples of data inconsistency regarding language plus with what I would do in the new script: (for having a record the list by hand when this ticket has to be done)
1) Slides with different value than the deck but at least one slide has the same as the deck

{"_id":50,"deckLanguages":["de"],"slideLanguages":["pt","de"]}

Here "de" should be everywhere.
2) No value in deck

{"_id":2590,"deckLanguages":[null],"slideLanguages":["en"]}

Here the slide language should be used in the deck.
3) No value in decks and slides

{"_id":2448,"deckLanguages":[null],"slideLanguages":[]}

Set "en" everywhere.
4) Decks value and slides values dont match

{"_id":3405,"deckLanguages":["de"],"slideLanguages":["cs"]}

This is tricky, because its an old deck I would use the slide value in the deck and with new decks

{"_id":115639,"deckLanguages":["it"],"slideLanguages":["en"]}

the deck value in the slides.

by tboonx