zrax / pycdc

C++ python bytecode disassembler and decompiler
GNU General Public License v3.0
3.23k stars 623 forks source link

"if not" in lambda #157

Open rocky opened 6 years ago

rocky commented 6 years ago
lambda a: False if not a else True

when bytecompiled and decompiled comes out:

(lambda a: if not a:
False)
Aralox commented 3 years ago

Hi @rocky and @zrax, I was having a look at this for hacktoberfest. This problem happens with any if-expression, not necessarily in a lambda. Also related to https://github.com/zrax/pycdc/issues/129.

It looks like the ASTree.cpp::BuildFromCode() when dealing with the Pyc::POP_JUMP_IF_TRUE_A instruction (~ line 950), treats the block as a regular ASTBlock::BLK_IF block when in reality the curblock->blktype() is BLK_MAIN. I'm not sure how to fix this. Does there need to be a new ASTBlock type added for if-expressions? How should the AST be correctly constructed for if-expressions here?

If you think the fix is not too hard and is something I can do in a few hours after work / weekend, I would really appreciate some guidance. If I can help fix this in a more narrow scope (e.g. simple if-expressions) that help set the code up for a larger change later, I'd be keen for that too.

Thanks, Pravin

zrax commented 3 years ago

I honestly haven't looked into it enough to be able to guess whether it would be difficult or not. However, a good starting point would be to add the appropriate unit tests to show the failure (if they don't already exist). From there, it should be possible to look more closely at the disassembly for both types of if statements and see if it's something that can be reliably checked in the AST builder.

rocky commented 3 years ago

Hmmm... if you want a unit test, here's one that you can use.

And more along those lines, if you run uncompyle6 -aT 02_ifelse_lambda.pyc (pre-compiled bytecode can be found in https://github.com/rocky/python-uncompyle6/tree/master/test/bytecode_3.6_run and likewise for other versions) you'll get the assembly and one interpretation of how this might be parsed.

This might guide you in what context to look for.

Aralox commented 3 years ago

Thanks guys. Here are some disassemblies of a regular if, an empty regular if, and an if-expression. I can see that the regular if has a STORE_NAME between POP_JUMP_IF_FALSE and JUMP_FORWARD, but the if-expression doesn't.

The disassembly of the if-expression is exactly the same as the empty-if although it has a LOAD_CONST between POP_JUMP_IF_FALSE and JUMP_FORWARD while the empty-if doesnt.

In ASTree.cpp (https://github.com/zrax/pycdc/blob/36d93bd1a58fbb1590c444932bd855695150543d/ASTree.cpp#L1174), the curblock->size() for a normal if is > 0, but is 0 for the if-expression and the empty normal if. What does this mean? It seems like this would be the place to pick up the fact that we are dealing with an if-expression rather than an empty-if. How do we do this? Some way of detecting that we called LOAD_CONST/some expression before the if statement?

image

PS: I've written a (failing) test as suggested, thank you. Will include in a PR at some point if desired.

normal_if_disassembly.txt empty_if_disassembly.txt if_expression_disassembly.txt

rocky commented 3 years ago

While checking for a STORE inside the if or else blocks distinguishes the two in the example you give, it isn't true that this will distingish all cases. Change your example to

if a == 42:
    random.seed(2)
else:
    random.seed(3)

and you'll see this. What distinguishes the two is using the value of the computed if after the block is finished. In other words, there is a STORE which is either falled into from the line above (the else part of the computed if) or is jumped to from the end of the then.

Aralox commented 3 years ago

Thank you Rocky, I will try that when I next work on this. Is there a way to dump out the whole ast in the code? E.g. at a breakpoint. So I can understand all this better.

rocky commented 3 years ago

@Aralox for your hactoberfest I think you'll have an easier time if you just pick something that doesn't involve control flow. Like #165 or #136 or #124 to name a few.

Aralox commented 3 years ago

Thanks for your suggestion, I decided to do #165