maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.7k stars 105 forks source link

Auditing use of unsafe rust #398

Open pchickey opened 4 days ago

pchickey commented 4 days ago

Hi - I am new to the logos ecosystem.

I don't want to cast any doubt on the correctness of logos or that anyone should choose to use it - it looks like its a remarkable project that is clearly a great choice for many users.

I am upstreaming a crate which uses logos to generate a lexer ( https://github.com/bytecodealliance/wasmtime/pull/8872 ) into wasmtime. As part of accepting logos to be used as a transitive dependency in the wasmtime project, I need to certify that it meets cargo-vet's safe-to-deploy criteria: https://mozilla.github.io/cargo-vet/built-in-criteria.html#safe-to-deploy .

The lexer will be handling untrusted input, and after spending some time examining the way logos codegen works, I don't feel that I can certify that the use of unsafe rust is sound: while I don't have any evidence that it is unsound, the code generator, and the code it generates, is too complex for me to reasonably declare that any use of the logos derive macro is fully sound, as would be implied by marking it as safe-to-deploy.

My rough understanding, from reading the code generator and the author's blog, is that performance is a huge goal of logos, and it has achieved very high performance. In my use case, I don't really care about performance: the inputs to the lexer are small and infrequent, and if the lexer was one or two orders of magnitude slower, that would be fine. However, I do care about correctness, to the point of being very conservative in what dependencies I can accept.

One path forward might be to have an alternative code generator for the logos macros that uses entirely safe rust. Have the logos authors ever considered this approach? Otherwise, I will have to rewrite my lexer by hand, and in doing so I will lose composability with other logos lexers in our ecosystem.

jeertmans commented 4 days ago

Hello @pchickey! Thanks for your comment on using unsafe Rust.

I am not the author of the code itself, so I cannot really guarantee that the use of unsafe code is done safely, nor that it is actually important at every place it is used. But I guess that @maciejhirsz knows that better than me, and could maybe give his opinion.

I think an important first step would be to investigate the use of the unsafe keyword across all crates, and maybe list them here. I know some Rust tools have been developed to this end. Second, we could analyze each use-case separately, and see if we can simply go back to a safe alternative without loss of performances.

The solution to have a opt-in for safe (or unsafe) Rust could be obtained via a feature, #[cfg(feature = safe-only)] for example, but I wonder if that is that easy to implement, or if the compile-time might increase a lot.

This question needs further investigation, and help from anyone is more than welcome!