spraakbanken / korp-frontend

Frontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).
https://spraakbanken.gu.se/en/tools/korp
MIT License
16 stars 8 forks source link

Statistics subquery incorrect when using repetition and boundaries #354

Closed arildm closed 5 months ago

arildm commented 7 months ago

I previously "fixed" #289 by adding a []* to the subquery, but it seems to have added new problems.

This query for the NPEGL mode https://spraakbanken.gu.se/korplabb/?mode=npegl#?cqp=%3Cnp%3E%20%5B%5D%7B0,10%7D%20%5Bword%20%3D%20%22b%C3%A6%C3%B0i%22%5D%20%5B%5D%7B0,10%7D%20%3C%2Fnp%3E&corpus=npegl-ice&search_tab=1&within=text&show_stats&result_tab=2&search=cqp shows a statistics row with 6 hits for "bæði", but clicking it yields 14 hits, namely all that begin with "bæði".

~An example from an "ordinary" corpus is here https://spraakbanken.gu.se/korplabb/#?cqp=%5Bpos%20%3D%20%22MID%7CMAD%7CPAD%22%5D%20%5B%5D%7B0,1%7D%20%3C%2Fsentence%3E&corpus=attasidor&search_tab=1&show_stats&result_tab=2&search=cqp where the value ")" is reported 8 times, but the link shows 17 hits, including some ") ."~ (This example not needed now that NPEGL is public)

arildm commented 5 months ago

Discussed with Martin. Looks like the []{0,} should only be added if none of the reduced attributes is positional. If any of them is positional, the subquery will contain the right amount of tokens, and adding <match> ... </match> is sufficient.