Closed lysnikolaou closed 1 year ago
- The
tokenize
module doesn't have a tokenizer state. Should we create one now that we need to keep a stack of tokenizer modes? Can we get around that?
We should add it (or some form that allows us to keep track of the mode stack). Notice this variable needs to be local to the functions because several tokenizations can be going on at the same time and they should not interfere.
- Will all of the new tokens be part of the tokenize.py specification? My feeling is that we should implement all of FSTRING_START, FSTRING_MIDDLE and FSTRING_END for sure.
Yes, the new tokens will be part of the specification (that's also what people on the discourse thread want 👍
- What's the backwards compatibility policy on this? Should we maybe include a parameter that would turn f-string to regular
STRING
tokens so that code that relies on that continues to work?
The same as in the AST, this reflects the internal details of Python and therefore the "breakage" is justified. These things are a bit fuzzy so we can also decide that we want to add a flag if we want, although I don't think is necessary and that would complicate everything.
Thoughts?
I agree with the first two points @pablogsal , but for the third it would've been nice if we could have claimed full backwards compatibility. The AST example makes sense, but if we can avoid (or provide an alternative) to the breakage that would be really nice. Thought about leveraging exact_type
system but it still has problems (e.g. incomplete-strings with STRING
type that has the exact_type
of MIDDLE
/START
/END
etc.). So probably not so nice / clear either. I'll look into a few usage sites and see if it is really required or whether they can just adjust the code to handle the theoretical tokens with something like the following (copying pasting it to their own project):
def my_project_tokenizer(input):
f_string_tokens = []
for token in real_tokenize(input):
if token.type == F_STRING_START:
f_string_tokens.append([])
elif token.type == F_STRING_END:
value = untokenize(*f_string_tokens.pop())
new_token = Token(value, STRING)
if f_string_tokens: # nested f-string, add it inside
f_string_tokens.append(new_token)
else: # top-level f-string, return the final token
yield new_token
elif f_string_tokens:
f_string_tokens[-1].append(token)
else:
yield token
Heads up! I have written the new specification here:
As @pablogsal mentioned in his email, we need to specify how the
tokenize
module is going to work in light of the changes in the C tokenizer. We probably need to change it to reflect the new behaviour. Some questions we need to answer:tokenize
module doesn't have a tokenizer state. Should we create one now that we need to keep a stack of tokenizer modes? Can we get around that?STRING
tokens so that code that relies on that continues to work?Looking forward to your thoughts. Will probably start working on the code as soon as we've got some answers.
CC @pablogsal @isidentical