Subscript and superscript Unicode

trealla-prolog / trealla

A compact, efficient Prolog interpreter written in plain-old C.

MIT License

252 stars 11 forks source link

Subscript and superscript Unicode #547

Closed Jean-Luc-Picard-2021 closed 2 weeks ago

Jean-Luc-Picard-2021 commented 3 weeks ago

Just toying around:

$ ./tpl -v
Trealla Prolog (c) Infradig 2020-2023, v2.52.25

$ ./tpl
?- X = π(x).
   X = π(x).

?- X = π₂(x).
Error: syntax error, near '₂', operator expected, user:1

What would be an argument against including OTHER_NUMBER in Prolog identifiers?

Jean-Luc-Picard-2021 commented 3 weeks ago

Ok, interesting!

Scryer Prolog allows OTHER_NUMBER in Prolog identifiers:

$ target/release/scryer-prolog -v
v0.9.4-55-gd6ac0355

$ target/release/scryer-prolog
?- X = π(x).
   X = π(x).

?- X = π₂(x).
   X = π₂(x).

infradig commented 2 weeks ago

SWI Prolog doesn't, not really a reason though.

The actual reason is that Trealla uses the C function iswalnum (mainly) as in:

        while (iswalnum(ch)
#ifdef __APPLE__
            || iswideogram(ch)
#endif
            || (ch == '_')) {

and apparently C doesn't include OTHER_NUMBER in there.

infradig commented 2 weeks ago

The more I look into it the less reason I see to include it. In Maths & CS it is common to give identifiers designations like a' (eg a-prime) etc, which you can't do in Prolog either. I think it is a mistake for Scryer to allow it.

Jean-Luc-Picard-2021 commented 2 weeks ago

Interesting SWI-Prolog has a code_type/2 predicate, that can also work with mode (+, -):

?- char_code('𝜆', X), code_type(X, Y).
X = 120582,
Y = csym ;
Etc..

How would I do that in Trealla Prolog? What I figured out, there is a predicate for mode (+, +):

?- char_code('𝜆', X), '$code_type'(X, lower).
   X = 120582.

But there is no agreement how things are classified, SWI-Prolog thinks its "csym" what ever that means, and Trealla Prolog classifies it "lower", which is

closer to Unicode Categories. I think completely relying on Unicode Categories would have the advantage that it would give the perspective of

being consistent among Prolog systems. But it could be that there is no easy mapping to some common C-libraries.

Jean-Luc-Picard-2021 commented 2 weeks ago

while (iswalnum(ch)

I think this is not required. The problem is that OTHER_NUMBER has also some members which have fractional number values, or number values that are greater than a digit.

For example there is an OTHER_NUMBER:

⒙ Number Eighteen Full Stop https://www.compart.com/en/unicode/U+2499

So I do not classify it as digit in my system, and this here doesn't work:

?- number_codes(X, "₂₁").
Fehler: Keine Nummer.
    user auf 1

But I allow it in identifiers. The code from Novacore that does that is here:

sys_type_class(11, is_ident).

Drawback of allowing all OTHER_NUMBERS in identifiers, we can now fake a period. This query works in my system:

?- X = ⒙(Y).
X = ⒙(Y).

infradig commented 2 weeks ago

I don't think super/sucb should be part of identifiers. If you want to use them and attach them to identifiers, make them postfix ops.

Jean-Luc-Picard-2021 commented 2 weeks ago

I think most of the Unicode classification efforts are rooted in the fact that different Languages around the Globe have different Scripts, and different writing directions left to right,

right to left, and then there are rules in certain Scripts, where the writing direction changes, and for example a number 123 is not displayed 321, but still 123 for some reasons.

This would explain why even exotic Unicode points have certain attributes stored in the Unicode database. I am currently trying to find out what algorithm Scryer Prolog is using.

For example it doesn't recognize this beast:

?- X = ⒙(Y).
X = ⒙(Y).

I have the feeling it has a criteria for what is digit like, but unfortuately it has also no predicate code_type/2. So I have really no clue whats going on.

But I will probably do a revision of my algorithm, to get closer to Scryer Prolog, which seems to make quite some sense.