spraakbanken / korp-frontend

Frontend for Korp, a tool using the IMS Open Corpus Workbench (CWB).
https://spraakbanken.gu.se/en/tools/korp
MIT License
16 stars 8 forks source link

Simplify time interval CQP covering whole days #379

Closed janiemi closed 2 months ago

janiemi commented 4 months ago

The CQP expression generated for a time interval always contains tests for text_timefrom and text_timeto, even if they are 000000 and 235959, that is, the user specified only dates. For example, for the year interval 2002…2023 (internal representation [$date_interval = ’20020101,20031231,000000,235959']), Korp produces the following CQP expression:

[((int(_.text_datefrom) = 20020101 & int(_.text_timefrom) >= 000000)
  | (int(_.text_datefrom) > 20020101 & int(_.text_datefrom) <= 20031231))
 & (int(_.text_dateto) < 20031231
    | (int(_.text_dateto) = 20031231 & int(_.text_timeto) <= 235959))]

I think this could be simplified to:

[int(_.text_datefrom) >= 20020101 & int(_.text_datefrom) <= 20031231 & int(_.text_dateto) <= 20031231]

For example, this Korp frontend search is now translated to this backend query%20%7C%20(int(_.textdatefrom)%20%3E%2020020101%20%26%20int(.textdatefrom)%20%3C%3D%2020031231))%20%26%20(int(.textdateto)%20%3C%2020031231%20%7C%20(int(.textdateto)%20%3D%2020031231%20%26%20int(.text_timeto)%20%3C%3D%20235959))%5D&query_data=&context=&incremental=true&default_within=sentence&within=&show=sentence%2Clemma%2Cpos%2Cmsd%2Clex%2Cdephead%2Cdeprel%2Cref%2Csense%2Ccomplemgram%2Cne_ex%2Cne_name%2Cne_type%2Cne_subtype%2Ccompwf%2Cprefix%2Csuffix&show_struct=text_type%2Ctext_subject&cache=false), but it could be simplified to this one (cache=false added to both to allow speed comparison).

This would be slightly faster, although the speed-up was smaller than I had expected: 5–30% in the cases I tried.

arildm commented 2 months ago

(comment moved to #393 )