help hints right on the search page

Several people with whom I discussed KonText said the search is not intuitive. Bushra told she was excited that Urdu corpus exists and tried to search, but she did not know how to. So when people get to the search page, they are frustrated what to do next. I suggest to put some information near the search box: kontext_ui We can put a 'try' button near the example.

Also a user can not get to the help page with one click. You should go Help->User manual, and then you see the UCNK manual in Czech. manual

I propose to put a button help again at a query page and link it to my manual in English ( http://ufal.mff.cuni.cz/lindat-kontext ) or some other page that we will come with.

We decided not to put help right on the first search page as proposed above, but to do something else, e.g

put a link to some help page (separate for each corpus, maybe implement in templates?)
make a popup window
add more question marks explaining functions of individual button

Make a tutorial on KonText UI with illustrations (with screenshots, or maybe some video like Vincent did for annotation guidelines : https://ufal.mff.cuni.cz/czech-legal-text-treebank )

@vidiecan @kira @shark Currently, we have an explanation for the basic query type (one just has to click the little blue question mark). Can you point me to where its text is defined? I could then formulate similar short explanations for the other query types. Also, let me know if a clickable link can be included in the explanation (that would be useful for the CQL query type which cannot be reasonably explained in a small box).

@Ansa211 I'll insert the explanations if you give me the text (both English and Czech). The task here is not only to put some text somewhere but also to modify some typescript files.

@tomachalek I think this feature should be ported also to ÚČNK's instance.

BTW, wouldn't it make more sense to list the "Word Form" and "Character (Word part)" query types before the "Phrase" query type?

Basic: search for the input expression as a word form case-insensitively; if it is also a canonical dictionary form (lemma), all its word forms are searched for as well. The description of the "basic" search type should actually reflect the possibilities provided by the given corpus (i.e., the actual CQL query that the search is translated to): if the corpus does not contain lemmas, that part of the description should be left out.

Lemma: interprets the input expression as a regular expression matching a canonical dictionary form (lemma) from beginning to end; all its forms are searched for. Optionally, the part-of-speech (PoS) of the given lemma may be also specified.

Phrase: interprets the query as a sequence of regular expressions matching word forms; the default to search case-insensitively can be overriden by the "Match case" option.

Word form: interprets the query as a regular expression matching an individual word form; the default to search case-insensitively can be overriden by the "Match case" option, and the search may be limited to a selected part-of-speech (PoS).

Character: interprets the query as a regular expression matching a part of a word form. This query type should be renamed to Word part (as has already been done on the ÚČNK instance). It is not clear to me why we do not provide the same additional options as with the "Word form" query type.

CQL:

this query type allows the user to access all annotation present in the corpus;
the requirements on each position (word) in the query are listed in a pair of square brackets and 
structural attributes (sentence, document etc.) are marked by a pair of pointed brackets, e.g. 
[attribute1_of_word1="regex1" & attribute2_of_word1="regex2"] [attribute_of_word2="regex"] within <s/>
will search for a sequence of two positions with the specified attributes and appearing in a single sentence
for more help, see https://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk?s[]=cql and https://ufal.mff.cuni.cz/lindat-kontext

or maybe shorter:

this query type allows the user to access all annotation present in the corpus, e.g. 
[attribute1_of_word1="regex1" & attribute2_of_word1="regex2"] [attribute_of_word2="regex"] within <s/>
will search for a sequence of two positions with the specified attribute values and appearing in a single sentence
for more help, see https://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk?s[]=cql and https://ufal.mff.cuni.cz/lindat-kontext

Note that the two links at the end of the description should be clickable.

For Czech, we could use the descriptions from the manual; here, I suggest slightly more elaborate explanations, but without examples (it would be necessary to create appropriate examples for each corpus, and we do not want to do that, do we?):

Základní: Vyhledá vložený výraz jako slovní tvar bez ohledu na velikost písmen; jde-li zároveň o základní slovníkový tvar (lemma), vyhledají se také všechny jeho tvary. Bez regulárních výrazů.

Lemma: Vyhledá všechny pozice, jejichž lemma (slovníkový tvar) odpovídá vloženému řetězci (regulárnímu výrazu; rozlišuje malá a velká písmena); možnost upřesnit slovní druh.

Fráze: Vyhledá přesně zadanou frází; zadaný řetězec interpretuje jako mezerami oddělenou posloupnost regulárních výrazů, při výchozím nastavení vyhledává bez ohledu na velká a malá písmena.

Slovní tvar: Vyhledá pozice, které svým tvarem odpovídají zadanému řetězci (regulárnímu výrazu); dodatečnými volbami je možné zapnout rozlišování velkých a malých písmen a omezit vyhledávku na jeden slovní druh.

Část slova: Vyhledá po sobě následující znaky v rámci jednoho slova; umožňuje použití regulárních výrazů.

CQL:

Komplexní dotazovací jazyk umožňující práci s veškerou anotací, která je pro daný korpus dostupná.
Např. následující dotaz vyhledá dvě po sobě jdoucí pozice (slova) se zadanými hodnotami atributů a vyskytující se v jediné větě:
[atribut1="regulární_výraz1" & atribut2="regulární_výraz1"] [atribut="regulární_výraz"] within <s/>
Detailnější úvod do jazyka CQL naleznete zde https://wiki.korpus.cz/doku.php/pojmy:dotazovaci_jazyk .

where the second but last word "zde" should be a link to the given url (which does not have to be shown).

@Ansa211 OK, I'll discuss this with @michkren and post a response here.

Základní: Typ dotazu základní vyhledá vložený výraz jako slovní tvar bez ohledu na velikost písmen; jde-li zároveň o základní slovníkový tvar (lemma), vyhledají se také všechny jeho tvary. Nepodporuje použití regulárních výrazů.

Lemma: Typ dotazu lemma vyhledá všechny tvary přiřazené k danému lemmatu (slovníkovovému tvaru). Rozlišují se malá a velká písmena a je možné použít regulární výrazy.

Fráze: Typ dotazu fráze vyhledá zadanou posloupnost slovních tvarů; je možné použít regulární výrazy.

Slovní tvar: Typ dotazu slovní tvar vyhledá přesně zadaný slovní tvar; je možné použít regulární výrazy.

Část slova: Typ dotazu část slova vyhledá po sobě následující znaky v rámci jednoho slova. Rozlišují se malá a velká písmena a je možné použít regulární výrazy.

CQL: CQL je komplexní dotazovací jazyk umožňující práci s veškerou anotací, která je pro daný korpus dostupná. Např. následující dotaz vyhledá dvě po sobě jdoucí pozice (slova) se zadanými hodnotami atributů a vyskytující se v jediné větě: [atribut1="hodnota1" & atribut2="hodnota1"] [atribut3="hodnota3"] within \ ~~Pro hodnoty atributů lze použít regulární výrazy. Detailnější výklad naleznete v dokumentaci a v příkladech.~~

NOTE I still do not see why the type Část slova/Word part does not have the same options as Slovní tvar/Word form. Also, in some corpora, lemmas are not necessarily lowercase, so being able to search them case insensitively might be helpful...

We've agreed on adding all the query type hints posted here (we will use a small :question: icon near the query type selector). As regards the query type order we prefer keeping the current one. Actually, we would like to reduce the number of choices there sometimes in the future.

KonText v 0.11.x now contains the hints. You can see a live preview on our testing site: https://kontext-test.korpus.cz.

@Ansa211 can we close this one?

@vidiecan There is a PR (https://github.com/ufal/lindat-kontext/pull/153) that contain the final version. We made some clarification of wording and I think they are not in the current version.

Seeing the version that we agreed upon now, I will have a few more comments to the PR, so please do not merge immediately.

ufal / lindat-kontext

help hints right on the search page #5