rust-analyzer / rowan

Apache License 2.0
689 stars 57 forks source link

Make rowan generic over the type of token text #71

Closed arucil closed 4 years ago

arucil commented 4 years ago

Currently, the token texts are stored in SmolStrs, which limits the source code to Unicode. IMHO, make the token text generic would widen the scope of rowan, such as representing a syntax tree for a binary protocol.
(Actually, I'm planning to write a parser for a programming language whose source code is non-Unicode, which cannot be converted to Unicode since it contains some unique emojis that doesn't exist in Unicode. Having a generic lossless syntax tree would make my work much easier. That's where my idea comes from.)

matklad commented 4 years ago

These are interesting ideas, but they are not in line with rowans direction :)

Being non-generic is an explicit design goal of rowan. For larger projects separate compilation > increases flexibility.

As narrow scope as possible is an explicit goal — it’s hard enough to design syntax tree library for a single use case.

Forking rowan and experimenting is explicitly encouraged though!

CAD97 commented 4 years ago

I could see a version of Rowan working on [u8] rather than str, but it wouldn't be generic, it'd be just a [u8] implementation (and probably a wrapper that enforces/assumes UTF-8 correctness).

binary protocol

I strongly believe that a CST is the wrong choice for binary (or even arbitrary byte stream without bit-level adressing) parsing. The primary appeal of a CST is in keeping the exact structure of the input string, even when syntactically invalid, primarily for the purpose of interactive/IDE use.

The benefit of a CST is mostly moot for bit/byte protocols. They typically don't have a "syntactically invalid" state beyond too much or too little data.

which cannot be converted to Unicode since it contains some unique [characters] that doesn't exist in Unicode

That's exactly what the private use areas are for, actually.

arucil commented 4 years ago

@CAD97 PUA looks like a promising way, I'll give it a try, thank you :D