Closed mbhall88 closed 9 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
a80fa26
) 97.73% compared to head (62bca59
) 97.76%.
Second comment: echoing my review of #211, do you want to have a chat about a formal lexer/parser for snakefmt
?
Thanks for fixing these v. important bugs @mbhall88 A bit unfortunate for us that python 3.12 is changing f-string tokenisation
I'm just going to try to summarise in words what I understand from this PR but also what I don't understand from snakefmt (:cry:) and what I think we might need to do specifically here:
* You're adding 'space addition' through function `add_token_space` by registering 'FSTRING_START' to the `spacing_triggers` dict; why can't we run f-strings through `black` again?
We can run them through black, but they get mangled by snakefmt's parser now before they go into black sadly.
* If we **can** run f-strings through black, can't we try and run `process_token` from encountering 'FSTRING_START' to encountering 'FSTRING_END' as per the new f-string [token sequence](https://peps.python.org/pep-0701/#new-tokens) ? I'm thinking of something like lines 394 and 396, where if we're `in_brackets` or `in_lambda` we keep parsing, instead of triggering snakefmt parameter resolution. Here we'd similarly ask if we're `in_fstring` BUT it seems that you're having to introduce parse 'FSTRING_MID's individually (lines 335-341) and add braces because they get lost?
That is an interesting idea. Not sure I completely understand how we would do this though as I think this section of the code was your creation and I'm not as familiar with it. Feel free to have a go at adding an in_fstring
variant.
Yes, I had to introduce them as tokenize
automatically removes them, so we only get a single brace when doubles are used. They get returned in their surrounding strings too, not even as a single token which made it slightly more tricky.
So in fact IIUC, tokenizer gets rid of '{' and '}' inside f-strings, so we have to re-add them manually? And thus, we can't just try and 'parse through' an f-string from 'FSTRING_START' to 'FSTRING_END' because that way we lose '{' and '}' in f-strings?
Essentially yes, this is the problem I ran into. Another option I had thought of was once we hit FSTRING_START
we greedily take tokens until FSTRING_END
and we use the positions of those tokens in the original string (I assume we get that?) to pull out the whole f-string and make a STRING
token out of it. However, f-strings have added functionality in python3.12 now that is compatible with 3.11 - though I'm not sure if this would be a problem with passing the string to black or not...
Second comment: echoing my review of #211, do you want to have a chat about a formal lexer/parser for
snakefmt
?
See my message on slack
Replying to your reply https://github.com/snakemake/snakefmt/pull/213#issuecomment-1882079723, I think the solution you've implemented is currently the best - it's actually v. succinct
And I'm not a fan, in general, of extracting chunks of the original text using pointers stored in the tokens
- we've avoided doing this so far IIRC
And my idea of trying to parse through the whole f-string and passing it to black will not work because of the loss of the double curly brackets; incidentally, why do they now do that?? It's a pain for us
So this is good to merge, thanks @mbhall88 !
Python 3.12 now formalises f-strings by parsing them as their own entities, rather than plain strings. This creates some problems for us as where a whole f-string would produce a single
tokenize.STRING
, it now breaks f-strings up into bits and adds the new tokensFSTRING_START
,FSTRING_MIDDLE
andFSTRING_END
.The fixes in this PR relate to issues people have already come across. I suspect there will be more. My solution is somewhat hacky in places but it was the best I could do given the tokens the
tokenize
library is giving us. #207 is particularly problematic as the tokenizer is the one that removes the extra braces, hence why I am manually adding them back. Fingers crossed this doesn't raise further issues.I have also added python 3.12 to the CI so we are testing across
<3.12
and>=3.12
Closes #207 and closes #210