leibnitz27 / cfr

This is the public repository for the CFR Java decompiler
https://www.benf.org/other/cfr
MIT License
1.93k stars 249 forks source link

inner classes should be separated by "$" not by "." dot when decompiled alone #263

Open judovana opened 2 years ago

judovana commented 2 years ago

When CFR decompiles inner classes, it stores them with dot as delimiter.

class A {
 private static class B {
 }
}

Are compiled as A.class and A$B.class and as those they are also seen by JVM. If CFR decompiles the inner class only, it creates it as A.B.java. Which is wrong, and can not be compiled. If CFR goes correctly, and stores them as A$B.java, then this inner class can be in many cases compiled, and directly used.

Aligned with naming, the used name should be kept. So instead of current class A.B {... the class A$B {... should be used

liach commented 2 years ago

If it indeed was an inner class, I'd say A.B.java is a more accurate representation. A$B is actually different from A.B in class file attributes, that the relation for A.B is recorded in InnerClasses attribute while that for A$B is not.

judovana commented 2 years ago

hmm. Then it is already wrongly substituting $ by dot. In the binaries, I see only $ signs in inner classes. No dots nowhere. If dot would be correct, then name clash of pkg1.pkg2.clazz2 x pkg1.clazz1. innerclazz1 if pkg2 name would be same as clazz2 name and clazz2 name would be same as innerclazz1 (where actualy x$y is perfectly valid class name and you can compile inner classes as standalone chunks if necessary, but ypu must not replace $ by dot.)

-- Mgr. Jiri Vanek @.***

---------- Původní e-mail ---------- Od: liach @.> Komu: leibnitz27/cfr @.> Datum: 11. 4. 2022 2:05:30 Předmět: Re: [leibnitz27/cfr] inner classes should be separated by "$" not by "." dot when decompiled alone (#263) "

If it indeed was an inner class, I'd say A.B.java is a more accurate representation. A$B is actually different from A.B in class file attributes, that the relation for A.B is recorded in InnerClasses attribute while that for A$B is not.

— Reply to this email directly, view it on GitHub (https://github.com/leibnitz27/cfr/issues/263#issuecomment-1094417388), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAWFCS3K3ETWZZRZT4K3OB3VENUEHANCNFSM5BZ6OAVA) . You are receiving this because you authored the thread. Message ID: < @.***> "

Col-E commented 2 years ago

If CFR decompiles the inner class only, it creates it as A.B.java. Which is wrong, and can not be compiled. If CFR goes correctly, and stores them as A$B.java, then this inner class can be in many cases compiled, and directly used.

While it may be true that the decompiled inner classes are more likely to be compilable following the A$B convention doing so breaks the inner-outer relationship of B with A. Recompiling the class alone would not emit NestHost or an InnerClasses attribute, which would break access in some cases.

If you want to recompile decompiled code (already a messy task) you do not want to do it this way.

If it indeed was an inner class, I'd say A.B.java is a more accurate representation.

If you're decompiling an application with inner classes, you should not generate any individual files for inner classes at all. They should be output as part of the outer class. In these cases A.B is the appropriate source level construct you want to use so that your decompilation is as accurate as possible.

If you are explicitly decompiling the inner class by itself without any outer class context it should be reasonable to assume that source-level constructs are not the target for what is deemed "accurate". Without any context you also cannot de-sugar inner class patterns used to reference the outer class (and references used by the outer to access private members in the inner class). Since there isn't sufficient data to fully reconstruct the source-level constructs across the entire inner class its best to keep things uniform in representation. In this case, including the A$B naming convention along.

A$B is actually different from A.B in class file attributes, that the relation for A.B is recorded in InnerClasses attribute while that for A$B is not.

What are you talking about? (This sounds ambiguous / needs elaboration)

Source:

class A {
    public static void main(String[] args) {
        System.out.println("Hello " + new B().get());
    }

    private static class B {
        private String get() { return "text"; }
    }
}

Binary: image

Plop a class in a hex editor and look at the values its pointing to. There are 4 pieces to an entry in the InnerClass attribute.

title description
Inner class info A CP_CLASS reference inner class (self class).
Outer class info A CP_CLASS reference to the outer class.
Inner name A CP_UTF8 reference to the name of the class (Akin to Class#getSimpleName), or 0 if the inner class is anonymous.
Inner access Flags used by the compiler for access information, assuming source code is not available

There is no . value anywhere in any of the references. The . character is used at the source level, not the bytecode level.

judovana commented 2 years ago

Well written Matt. TY!

liach commented 2 years ago

My point is that both A$B.java and nested/inner class B in A.java compile to A$B.class. The feature that distinguish them is that in the nested/inner class version, there is in InnerClasses an entry describing A$B as a nested/inner class of A, while such entry does not exist for the A$B.java version. Per JVMS

In addition, the constant_pool table of every nested class and nested interface must refer to its enclosing class, so altogether, every nested class and nested interface will have InnerClasses information for each enclosing class and for each of its own nested classes and interfaces.

Imo if we do decompile it to A$B.java, we need a comment to clarify that this is intended to be an inner class but the outer class could not be found.

judovana commented 2 years ago

That sounds like a correct thing to do.

Col-E commented 2 years ago

My point is that both A$B.java and nested/inner class B in A.java compile to A$B.class. The feature that distinguishes them is InnerClasses which describes the relationship of inner B to outer A inside the class A$B. No such entry exists for the A$B.java version

I fail to see how this is relevant. I understand the behavior of the InnerClasses attribute but we're talking about decompiling just a single inner class without the context of the outer class. At this point it is already impossible to recover that relationship due to the fact the outer class is missing. This is a failure of the "decompile recompile" approach. You cannot in this state ever regenerate a proper InnerClasses attribute.

The argument for this naming convention comes down to "what is the best for the situation if we are missing context?". It is my opinion, which anyone is free to disagree with (more perspectives leads to better discussion and thus a better solution), that in this specific situation A$B is the better name. It matches the true name of the class. Again, should you attempt to decompile the contents of a Java application, you should not ever have any separate files for inner classes.

Imo if we do decompile it to A$B.java, we need a comment to clarify that this is intended to be an inner class but the outer class could not be found.

:100:

The warning should be in any decompiled inner class regardless of the file name if there is missing context.

liach commented 2 years ago

Yes. Your solution of using A$B.java as file name and add comment warning should be how such standalone inner class files be handled.

Lanchon commented 2 years ago

for the case of decompiling an isolated inner/nested class, i think there's been a CFR switch from immemorial times that disables nested class resugaring. so CFR has always been able to do what you want.

the feature of reverting to this mode for a single class that is missing its parents (at least one of them i suppose?) might not be desired or carry is own weight. when a class is missing some but not all of its parents, some resugaring might still be working and be useful. in this context, partial disablement of resugaring to decompile to something like A.B$C starts to look unnecessarily complex and not really useful.

but the real complexity increase might come into play -IDK, it's an hypotheses- when decompiling other classes. a decision whether to resugar the binary name A$B will have to be deferred until A and B and their relationship has been analyzed. if that information dependency already exists for other reasons, then yes, this feature is simple to implement. otherwise the inclusion of this dependency can go as far as requiring a new pass (very unlikely), but the point is that the cost of the feature might outweigh its questionable value. and there are no free rides or infinite resources: in general, for each feature implemented others will be left out.