Closed pablogsal closed 3 years ago
In the initial implemention, I simply ignored multiline debug expressions and fetched the actual source from the tokenize buffer. https://github.com/we-like-parsers/cpython/blob/9e05991a7ae2aa3bc6e66eced957a50104f5e791/Parser/pegen.c#L2890-L2902
Though since it is getting replaced at every different line, we need a way of storing each line's contents if we are inside of an f-string expression. I'm experimenting with it right now, though I did not found an easy way. For example this version seem to work but it is a really hacky/complicated implementation, so I need to somehow simplify it.
It currently works by creating a new buffer in the tokenizer_mode
, and if we are inside the f-string expression and when we see the initial {
we start recording. At each line, while the tok->buf
gets replaced, we simply append to our existing buffer. Normally we should free it as soon as we close the expression with }
but due to the way we parse debug expressions (we parse the whole f-string expression and then continue) we can't free it until we actually use it's contents. If the expression is multiline, we try to find where it ended by manually searching newlines among lines, and trying to match the position which seem a bit error-prone so this is something that I especially want to get rid of.
Possible further ideas:
last_expression_buffer
somewhere, so we can do free-ing as we are supposed to do on the tokenizer.:
we can try to record outermost =
character's end, and use it to determine the end. We could do the same thing we are doing for the interactive mode and place all the parsed source into a buffer and then just record the offsets into said buffer. There is some scaffolding already for this, you would need to activate for non interactive mode as well.
I initially went with the interactive_src_start
(like when we initially implemented the debug expressions) and that approach was more problematic than this. From what I can recall; we had to modify a bunch of places to add the line since they were not part of the interactive flow (we also the same in the current version, but less places AFAICT). Another problem was that, since we always missed the first line (even though that was part of the tokenizer_mod->f_string_multiline_start
, it made it complicated to work with) and we still needed to store 2 pointers for each expression (amount of characters relative to the start of the string, and the size of the expression itself). Interactive mode also gets deallocated at the end, but the current approach gets deallocated on each f-string.
I've made some changes (some of the above + others), and I think the code is simple enough for us to go with. Also seems like we pass all the debug expression tests on the suite.
Multi-line debug expression may be broken. We need to either validate that they work in all cases or properly fix them