SQL files can contain any byte sequence. However, this library only exposes a way to parse a rust string (that is, a sequence of unicode codepoints). This makes it impossible to parse some SQL files (such as the wikipedia dumps I am currently working with), as thay contain byte sequences that are not valid utf-8.
The api should expose a function that takes an &[u8] instead of an &str.
For handling byte sequences that are not valid utf8 in literal strings, I see two possibilities:
Using the already existing Blob(Vec<u8>) (the information that the literal was a string and not a blob would be lost)
Using a fault-tolerant utf8 decoder like rust-encoding (invalid characters would be lost).
SQL files can contain any byte sequence. However, this library only exposes a way to parse a rust string (that is, a sequence of unicode codepoints). This makes it impossible to parse some SQL files (such as the wikipedia dumps I am currently working with), as thay contain byte sequences that are not valid utf-8.
The api should expose a function that takes an
&[u8]
instead of an&str
.For handling byte sequences that are not valid utf8 in literal strings, I see two possibilities:
Blob(Vec<u8>)
(the information that the literal was a string and not a blob would be lost)rust-encoding
(invalid characters would be lost).