How much should rustc understand the template string?

fintelia commented 4 years ago

There are a range of options here, some of which have already been ruled out based on feasibility / functionality. Some of these might be possible to add after a MVP, but it is worth considering whether doing so would be a breaking change or if small actions now could ensure they aren't:

~~No understanding, not even of escape characters + register replacements~~
Register replacements + escape characters, but no concern of whether replacements are reasonable. {} is allowed inside quoted strings and will directly concatenate register names with adjacent characters.
Commented regions are stripped out, but no other semantic understanding of asm.
A "C-like" preprocessor is run over the code
Rustc and/or clippy do some tokenization to sanity check the string. Only issues that would definitely result in an assembler error are reported.
Deprecated syntax and/or blacklisted assembler directives trigger warnings/errors
Only whitelisted syntax/directives, but no code transformation or semantic understanding of what the asm code or assembler directives actually do.
Simple "psuedo assembler directives" which act as aliases for more complicated ones, use Rust formatted octal literals rather than C-formatted ones, etc.
More substantial syntax level transformations on the format string, without understanding individual instructions
~~Assembly instructions for each architecture are validated against a whitelist, maybe also validating operands~~
~~Inline asm as syntax: the template string is really a DSL compiled by rustc directly to llvm IR / machine code.~~

fintelia commented 4 years ago

My personal opinion is that this is likely the time time Rust will ever be able to ban assembly syntax or enforce rules to make it more interpretable, so we should take advantage of it (provided these things aren't too unreasonable to implement on the compiler side). Unbanning things later would always be an option, but I doubt many people would be too upset if things like "leading zero means octal literal" or "# maybe starts a comment except when it doesn't" were no longer around.

comex commented 4 years ago

I favor minimal preprocessing. I think it's easier for users to understand a rule that "outside the format string is Rust syntax, inside the format string is native-assembler syntax", rather than creating some hybrid of the two syntaxes. Admittedly, register replacements force us to have some Rust syntax inside the format string. But I'd rather keep that as essentially a variant of format! string interpolation, where the output string just happens to be in assembler syntax.

joshtriplett commented 4 years ago

Rust should not attempt to interpret the string at all; the assembler has far more depth than we want to teach Rust on every architecture.

Lokathor commented 4 years ago

Clarification question: Do you mean other than the in reg and out reg values being formatted into place?

fintelia commented 4 years ago

Rust should not attempt to interpret the string at all; the assembler has far more depth than we want to teach Rust on every architecture.

I don't think this follows. Just because the assembler has a ton of depth doesn't mean that rustc can't try to interpret the string at all. The current RFC says that the syntax used is GNU assembler syntax which means that lines starting with a period are assembler directives and should have the same meaning regardless of the architecture. Thus, it shouldn't for instance be an issue for the compiler to figure out which directives are used in an inline assembly statement

rust-lang / project-inline-asm

How much should rustc understand the template string? #6