leibnitz27 / cfr

This is the public repository for the CFR Java decompiler
https://www.benf.org/other/cfr
MIT License
1.93k stars 249 forks source link

CFR uses numerical literals instead of constants #353

Open DecompExplorer opened 5 months ago

DecompExplorer commented 5 months ago

It's commonplace that the developers tend to create a multitude of well-named constants that might be used across their projects. And I've found that CFR sometimes intends to omit the constants while using the numerical literals these constants have stood for. At this case, CFR may encounter challenges in retaining these numerical literals to the corresponding constants, thus having a negative impact on the code readability and understandability.

I think that improving the strategy of coping with the relatively special constants defined by source codes might help resolve this problem.

Here is an example:

The following code snippet is from src/main/java/org/bukkit/util/BlockIterator.java in the project Bukkit:

private static final int gridSize = 1 << 24;
public BlockIterator(...) {
    [...]
    secondError = floor(secondd * gridSize);
    [...]
}

The corresponding code generated by CFR:

private static final int gridSize = 16777216;
public BlockIterator(...) {
    [...]
    this.secondError = NumberConversions.floor(secondd * 1.6777216E7);
    [...]
}

As we can see, CFR has replaced the GridSize with a numerical literals, which might make the code less understandable.

The corresponding .class file can be found here

wodin commented 5 months ago

The name of the constant is not stored in the constant pool in the bytecode. The disassembled code looks like this:

     419: aload_0
     420: dload         23
     422: ldc2_w        #117                // double 1.6777216E7d
     425: dmul
     426: invokestatic  #66                 // Method org/bukkit/util/NumberConversions.floor:(D)I
     429: putfield      #119                // Field secondError:I

So it's just fetching the value stored at index 117 in the constant pool and pushing it onto the stack. There's no name associated with that value.

So the best CFR would be able to do is guess that it's the same as the gridSize constant. Is that what you're proposing?

DecompExplorer commented 5 months ago

The name of the constant is not stored in the constant pool in the bytecode. The disassembled code looks like this:

     419: aload_0
     420: dload         23
     422: ldc2_w        #117                // double 1.6777216E7d
     425: dmul
     426: invokestatic  #66                 // Method org/bukkit/util/NumberConversions.floor:(D)I
     429: putfield      #119                // Field secondError:I

So it's just fetching the value stored at index 117 in the constant pool and pushing it onto the stack. There's no name associated with that value.

So the best CFR would be able to do is guess that it's the same as the gridSize constant. Is that what you're proposing?

Yes, that's what I'm proposing. Thank you for clarifying that!

leibnitz27 commented 5 months ago

This is (partially) handled in InstanceConstants

https://github.com/leibnitz27/cfr/blob/d6f6758ee900ae1a1fffebe2d75c69ce21237b2a/src/org/benf/cfr/reader/bytecode/analysis/opgraph/op3rewriters/InstanceConstants.java#L89

However, while it's a 'nice' thing, it tends to lead to completely confusing code in all but the most obvious cases; it's been a while so I may be mis remembering, but GUI code in particular uses a lot of magic constants , so the code to do this is very pessimistic. ( while a false negative is annoying, a false positive is downright confusing ).

DecompExplorer commented 4 months ago

This is (partially) handled in InstanceConstants

https://github.com/leibnitz27/cfr/blob/d6f6758ee900ae1a1fffebe2d75c69ce21237b2a/src/org/benf/cfr/reader/bytecode/analysis/opgraph/op3rewriters/InstanceConstants.java#L89

However, while it's a 'nice' thing, it tends to lead to completely confusing code in all but the most obvious cases; it's been a while so I may be mis remembering, but GUI code in particular uses a lot of magic constants , so the code to do this is very pessimistic. ( while a false negative is annoying, a false positive is downright confusing ).

Thank you for your explanation.