must_be/2 to support more than types (?)

mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.

BSD 3-Clause "New" or "Revised" License

2.05k stars 121 forks source link

must_be/2 to support more than types (?) #1309

Open UWN opened 2 years ago

UWN commented 2 years ago

Currently must_be/2 supports some types of 7.12.2 b and some informal as chars. Further candidates would be those in 8.1.2.1 and in general domain_errors of 7.12.2 c. This would help to make errors more uniform in particular the different reporting for list and character for chars and the like.

The following have occurred so far:

in_character (type error)

not_less_than_zero (type_error(integer, I) and domain_error)

triska commented 2 years ago

Another example that currently occurs in library(crypto) and library(charsio) is:

byte_char

I use it to denote a character whose code is in 0..255. It is like char, except that it raises a domain error if the code of the character is greater than 255. This is useful when using strings to compactly represent octet sequences in memory. The internal predicate '$first_non_octet'/2 can be used to efficiently locate the first "non-octet" in strings. Maybe this could be a potential candidate for inclusion in library(error)? For example, as:

must_be(single_octet_chars, Cs)

infogulch commented 2 years ago

How are lists of single octet characters represented in memory? If chars is utf8, then any char value between 128-255 would be represented with two bytes. Is there a special-cased octet-list representation (u8 vec) akin to the char-list representation (utf8 string I assume)?

triska commented 2 years ago

@infogulch: The internal representation is UTF-8, so indeed the characters with codes in 128-255 are represented by 2 bytes each!

infogulch commented 2 years ago

Being 'slightly inefficient' (1.5 bytes per 'octet char' on average?) isn't much of an issue for general byte manipulation, especially compared to other representations (24+ bytes per element, oof). But for cryptography in particular, I'm concerned that using a nonlinear representation could expose the plaintext and intermediates to side channel attacks, maybe leaking one bit per octet (the high bit). Has this potential issue been considered already?

triska commented 2 years ago

When encrypting binary data by using the encoding(octet) option of library(crypto), the characters are first transformed to actual bytes (u8), all in the range 0..255:

https://github.com/mthom/scryer-prolog/blob/c45cdd6ea0c4f94269839d157bf2221c640f9b12/src/machine/system_calls.rs#L5995

UWN commented 2 years ago

It seems this issue went a little bit into some side track. Any other types?

triska commented 2 years ago

not_less_than_zero is made available as part of #1593!