If we got rid of that, we could get rid of unput() altogether, which I've been trying to do for ages. This will make #474 significantly easier.
Why does this exist?
The use case for seen_line_token is that #warning x is a preprocessor directive, but 1 + #warning x is not. This is part of the reason I tied the preprocessor to the lexer. Say you have
"a"
"b"
Those two tokens should be concatenated into a single string, in which case everything is fine. However, if you have
"a"
# warning
then the preprocessor needs to know no token has been seen yet on the second line.
But seen_line_token gets set after parse_string returns! So it's treated as if you wrote "a" #warning, which is not correct. The current hack is to put back a newline character if we saw a newline in consume_whitespace, which requires unput
What is the fix?
We can't use consume_whitespace_no_newline because we need to know about following strings, there's no way to conditionally consume newlines.
Instead, we can change the algorithm: concatenate the strings in the preprocessor (or parser) instead of the lexer. The main difference there is that the preprocessor can store a pending token, where as the lexer can only store a pending character.
Currently, unput exists only for this hack:
https://github.com/jyn514/saltwater/blob/546ed7de472c2be7b57c3e44fba628afb856b9ae/src/lex/mod.rs#L662
If we got rid of that, we could get rid of
unput()
altogether, which I've been trying to do for ages. This will make #474 significantly easier.Why does this exist?
The use case for
seen_line_token
is that#warning x
is a preprocessor directive, but1 + #warning x is not
. This is part of the reason I tied the preprocessor to the lexer. Say you haveThose two tokens should be concatenated into a single string, in which case everything is fine. However, if you have
then the preprocessor needs to know no token has been seen yet on the second line. But
seen_line_token
gets set afterparse_string
returns! So it's treated as if you wrote"a" #warning
, which is not correct. The current hack is to put back a newline character if we saw a newline inconsume_whitespace
, which requiresunput
What is the fix?
We can't use
consume_whitespace_no_newline
because we need to know about following strings, there's no way to conditionally consume newlines.Instead, we can change the algorithm: concatenate the strings in the preprocessor (or parser) instead of the lexer. The main difference there is that the preprocessor can store a pending token, where as the lexer can only store a pending character.
This should also solve https://github.com/jyn514/saltwater/issues/361.