snowballstem / snowball

Snowball compiler and stemming algorithms
https://snowballstem.org/
BSD 3-Clause "New" or "Revised" License
757 stars 173 forks source link

Test for language features #157

Open ojwb opened 3 years ago

ojwb commented 3 years ago

It would be good to have testing of Snowball language features (especially those not used by any current algorithm implementation) which ran for each target language.

As David Corbett noted in #156, several backends weren't implementing integer division. I've fixed them, but we lack a regression test, and lack automated testing that new backends get this right.

This is the test code I added at the start of stem in english.sbl locally to check these fixes worked and that other backends weren't affected:

    $p1 = 7
    $p1 /= 4
    $p1 = 1

    $(7 / 4 * 4 == 4)

    $p1 = -7
    $p1 /= -4
    $p1 = 1

    $(-7 / -4 * 4 == 4)

    $p1 = -7
    $p1 /= 4
    $p1 = -1

    $((-7) / 4 * 4 == -4)

    $p1 = 7
    $p1 /= -4
    $p1 = -1

    $(7 / -4 * 4 == -4)
dscorbett commented 3 years ago

The manual says the pieces of an arithmetic expression have the same semantics as in C, so here are some tests for minint and maxint based on the C standard.

    $(minint <= -32767)
    $(maxint >= 32767)
    $(minint + maxint == 0) or $(minint + maxint == -1)
ojwb commented 3 years ago

Thanks.

I'd not considered that "C semantics" leads to imposing these requirements on minint and maxint, but I think it's helpful to have a defined minimum integer range. In practical terms, stemming algorithms would probably be fine with a signed 8-bit integer even, but sticking with the "C semantics" rule seems good, and it's unlikely that supporting a 16-bit integer would be problematic for any language we're likely to target.

ojwb commented 3 years ago

I've just pushed a change that implements compile-time evaluation of numeric subexpressions and tests (mostly as a step towards a longer term plan to track possible ranges for the values of integer and boolean variables, cursor position, length of the current string, slice positions, etc, through the program as there are optimisations we can do based on these). This is relevant here as the division test code above will need revising so we test division semantics of generated code in the target language rather than in the compiler.