policeman-tools / forbidden-apis

Policeman's Forbidden API Checker
Apache License 2.0
321 stars 34 forks source link

static method hiding produces false positives when original method is forbidden #237

Closed hossman closed 10 months ago

hossman commented 10 months ago

Consider the following code (contrived example, but simplest I could find in a hurry that anyone can try using only JDK methods w/o needing third-party libs) ...

import java.util.BitSet;
public class Main {
    public static final long[] data = new long[] {1, 2, 3, 4};

    public static void main(String[] args) {
        System.out.println(X.valueOf(data).toString());  // Line 6
        System.out.println(Y.valueOf(data).toString());  // Line 7
        System.out.println((new Z()).go());
    }

    public static class X extends BitSet { }

    public static class Y extends X {
        public static BitSet valueOf(long[] longs) {
            return new BitSet();
        }
    }

    public static class Z extends Y {
        public String go() {
            return valueOf(data).toString();            // Line 21
        }
    }
}

Let's see what happens when we tell forbidden-apis we don't want our code to use Bitset.valueOf(...) ...

hossman@slate:~/tmp/fapi-bug$ cat sigs.txt 
java.util.BitSet#valueOf(**)
hossman@slate:~/tmp/fapi-bug$ java -jar forbiddenapis-3.6.jar -d . -f sigs.txt 
Scanning for classes to check...
Reading API signatures: /home/hossman/tmp/fapi-bug/sigs.txt
Loading classes to check...
Scanning classes for violations...
ERROR: Forbidden method invocation: java.util.BitSet#valueOf(**)
ERROR:   in Main$Z (Main.java:21)
ERROR: Forbidden method invocation: java.util.BitSet#valueOf(**)
ERROR:   in Main (Main.java:6)
ERROR: Forbidden method invocation: java.util.BitSet#valueOf(**)
ERROR:   in Main (Main.java:7)
ERROR: Scanned 4 class file(s) for forbidden API invocations (in 0.06s), 3 error(s).
ERROR: Check for forbidden API calls failed, see log.

It correctly identified that line 6 is "bad" for it's "Forbidden method invocation", but the errors reported for Lines 6 & 21 are false positives -- those lines do not invoke Bitset.valueOf(...), they invoke Y.valueOf(...), which hides the static method with the same name provided by Y's ancestor class...

hossman@slate:~/tmp/fapi-bug$ java Main 
{0, 65, 128, 129, 194}
{}
{}
hossman commented 10 months ago

Practically speaking: This came up in the context of LuceneTestCase extending Assert but some Solr devs really wanting to forbid the deprecated methods in Assert -- but using static method hiding to re-define the deprecated methods in SolrTestCase still triggered forbidden api failures, so that approach was abandoned...

https://github.com/apache/solr/pull/947#issuecomment-1279651282 https://lists.apache.org/thread/nycfk6b2bqkqsbq29snow51w31qo1dd3

uschindler commented 10 months ago

Hi @hossman, sorry I was on a business trip when you opened the issue. Actually you are right the code handles static and virtual methods the same. This bug is known, but at beginning it did not seem important.

~After some checking: Only the Java Compiler searches parent classes for static methods (if not found in the class referenced), but bytecode always has the exact class name. If you move a method from a class to its superclass and do not recompile the code, it fails to link the code at runtime.~ (incorrect, see https://docs.oracle.com/javase/specs/jls/se21/html/jls-8.html, section 8.4.8).

So basically, I will add an if statement to the method resolver to not check parent classes if the method is static.

uschindler commented 10 months ago

Hi, correction: Static methods are inherited by subclasses, so the behaviour is correct. The problem is the viewpoint of forbiddenapis. In contrast to some code calling a method, forbiddenapis has to make sure that also hits are found when you call a virtual method in a subclass and this one calls super.xyz(). This is the problem of real "virtual" methods. The method called is decided at runtime (therefor virtual).

For static methods, we may stop going up to superclasses, but I have to check this one more time: I am not sure if you can use super() to call superclass's static method. It makes no sense but it is unclear if the java language allows this.

If you cannot call a superclass static methods with super, I agree to remove the superclass check.

uschindler commented 10 months ago

I have to read https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-6.html#jvms-6.5.invokestatic and compare with https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-6.html#jvms-6.5.invokevirtual

May take some time.

dweiss commented 10 months ago

Static methods are not virtual, Uwe. They can be shadowed but the invokestatic call always uses an explicit class reference (because of that you can call SuperClass.staticMethod from Subclass.staticMethod... in fact, you can call SuperClass.staticMethod and Subclass.staticMethod from anywhere you like).

uschindler commented 10 months ago

Here is the way how methods are looked up: https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-5.html#jvms-5.4.3

It makes no difference between static and non-static. The difference is only implemented in the compiler. Basically you can refer to a subclass' static method, if it does not exist it find the superclass' method. As forbiddenapis does not know if another subclass hides this method, it has to go to a superclass (because it needs to ensure there are no other possibilities how to call this method). Forbiddenapis problem also exists for virtual methods: As it doe snot know it the virtual method calls super.name() it has to disallow calls to that method on all subclasses. Basically there's no difference with static methods regarding lookup. If you have code that calls in its bytecode the static method Subclass#method() and this method no longer exists (e.g., because of refactopring) the JDK will per spec search for Superclass#method() [see above].

@dweiss : I explicitely said they are not virtual, I just compared the spec. When you have a virtual call you have the argument with the class on stack, but it then does exactly the same: it looks up the method as described above.

In short: actually in JVM runtime theres no difference with method lookups. The difference is only with the argument on stack which is the class for the virtual call. But lookup of method is identical, it just does not use the class on stack but the one in bytecode.

So basically that's all forbiddenapis can do. It has its limitations. We can add a specialcase for static methods to only look at the exact descriptor and not dive into superclasses, but this would have the problem of undetected calls.

uschindler commented 10 months ago

Ok. I think I can remove the static call. The problem with virtual calls is different. On bytecode you do not know which method is actually called, so you have to check all possible superclasses (from perspective of forbiddenapis, as first argument is unknown).

In case of static methods you can stop in forbiddenapis when method was found. The type is given on call site and you just have to look until you found one in the ancestors-or-self.

I will provide a PR.

dweiss commented 10 months ago

I'm not sure I agree with you here (with regard to virtual vs. static method calls)... but anyway. I think considering odd cases like swapping out classes without proper recompilation (which would emit different bytecode) is not worth it. 99.9% of the cases are legitimate valid code and javac would see the difference and link it up properly.

uschindler commented 10 months ago

I'm not sure I agree with you here (with regard to virtual vs. static method calls)... but anyway. I think considering odd cases like swapping out classes without proper recompilation (which would emit different bytecode) is not worth it. 99.9% of the cases are legitimate valid code and javac would see the difference and link it up properly.

See my response.

uschindler commented 10 months ago

Sorry for the confusion. The problem of forbiddenapis is that it needs to look somehow without runtime types. For virtual methods it is hard, because you have no information which class actually receives the call.

For static methods you can just follow the spec and stop on first found method.

uschindler commented 10 months ago

Btw., the same apples for fields.

dweiss commented 10 months ago

I did see it. But I still think maybe forbiddenapis shouldn't try to play JVM here - if a static method doesn't exist at its target class, something is wrong - classes are stale and this should perhaps produce a warning (javac should never produce such bytecode because it always resolves static call to a concrete class).

uschindler commented 10 months ago

I did see it. But I still think maybe forbiddenapis shouldn't try to play JVM here - if a static method doesn't exist at its target class, something is wrong - classes are stale and this should perhaps produce a warning (javac should never produce such bytecode because it always resolves static call to a concrete class).

Yes, if you scan your own compiled classes the output of javac should be "uptodate". Elasticserach uses forbiddenapis also to scan JAR files on unknown source which might be very outdated.

Anyways, I think the simplest / only thing to change is: If method call is static (invokestatic or MethodHandle with tag H_INVOKESTATIC) it should stop at first found method.

uschindler commented 10 months ago

~Hi, I think simplest is your first suggestion: for static lookup do not go up the chain.~

~The reason why I do this: The current forbiddenapis code does not check if the method really exists, it just looks up if any forbidden signature matches. So it is hard to fix the code. Under the assumption that static calls are always resolved to a concrete class by javac, I added an exit condition after the exact signature was checked.~

BTW: I fixed the code to assume static also for invokespecial (and its new variant) bytecode, because it is used for ctors and super calls which are always handled like static (argument is ignored).

uschindler commented 10 months ago

Here is a PR: #238

I still need to add a test.

uschindler commented 10 months ago

I rewrote the PR to now also look into superclasses until a match was found, It was doing this for fields already, but it was not correct according to static or virtual. Now both methods and fields are handled in the same way. #238

I still need a test. I will also run a check on Lucene, Solr, Opensearch as integration test.

... and sleep a night over it.

uschindler commented 10 months ago

Field handling was correct. For methods it now differentiates only between virtual/interface invocations (collect all superclasses) and static/special (only until found). Although for "special" it should not go to superclasses at all (used for constructors and super calls).

hossman commented 10 months ago

I still need to add a test.

FWIW: I was hoping to provide a (commitable) test case when i created this issue, but even as someone who has a lot of experience with java, and using forbidden-apis: I couldn't' make heads or tails of how to write a new test case for this type of situation based on the existing tests i found.

It would be great if there was a "HelloWorldIntegreationTest.java" that showed some basic rule checking of code compiled as part of the test class that could serve as an example to folks who want to submit future issues/PRs demonstrating problems they encountered.

uschindler commented 10 months ago

The test needs to be done like the many antunit tests.

The actual tests need to be example class files and then you can assert on the results with antunit.

I will take your example code above.

uschindler commented 10 months ago

Hi @hossman,

Test was added to PR: https://github.com/policeman-tools/forbidden-apis/pull/238/commits/fbb29ef9c6f254d8204d1652850a9bfda00ba8bb

I will also add a fields check and I want to extend the test to check virtual methods, too.

uschindler commented 10 months ago

FYI, to setup a test like this: Add a class with Java7* as prefix (this will be compiled on ant generate-test-classes and saved in repo as "test file" (the Java8* prefix is for lambda checks only). Then write an antunit test that checks the output (like error messages and/or line numbers and number of errors).

uschindler commented 10 months ago

I added a few more tests to check static and virtual methods:

I think it's ready.

hossman commented 10 months ago

ahhhh, ok. yeah. test looks good.

The part i wasn't understanding originally is that you keep all the "integration tests" in the anttest directory -- I assumed from the name that it was just for using anttest to check the behavior of the <forbiddenapis> ant plugin ... and i didn't look very closely so i didn't realize that's the way you do "end to end" testing (I kept expecting to find JUnit tests that would call new Checker(...))

Out of curiosity: why do you commit all the "test input" class files? I understand some of them are because they have to be compiled using old JDKs, or using oracle specific JDKs, but couldn't most test just be re-compiled on every run and pulled from the test-classpath?

uschindler commented 10 months ago

The part i wasn't understanding originally is that you keep all the "integration tests" in the anttest directory -- I assumed from the name that it was just for using anttest to check the behavior of the <forbiddenapis> ant plugin ... and i didn't look very closely so i didn't realize that's the way you do "end to end" testing (I kept expecting to find JUnit tests that would call new Checker(...))

This was easiest at beginning. There is still an issue open to have the Checker API return a list of structured "violation" instances. By this is would be easier to write "standard tests". Antunit was quite nice to configure input files/signature files/pass signatures. The only thing that is harder is to check is the returned violations. But with some logic you can grep on the log output and catch the cases you like.

Out of curiosity: why do you commit all the "test input" class files? I understand some of them are because they have to be compiled using old JDKs, or using oracle specific JDKs, but couldn't most test just be re-compiled on every run and pulled from the test-classpath?

That's also historical: At beginning forbiddenapis was java5+, so files for Java 7 and Java 8 and Java 9 needed to be compiled separately. In addition some tests were checking for undocumented features (the ones we would like to prevent, like sun.misc.Unsafe). At this time it was unclear if those classes survive. If you can't compile violating classes with all JDKs out there, it is hard to check (P.S.: In Lucene you can't run Gradle with IBM Semeru Runtime, because jfr module is missing there!). So back at that time I decided to add all class files as "test resources" and just added the option to regenerate those.

But yes, I am planning to switch to Java 8 as minimum requirement, so the classes can be compiled out of box (for all checks). That's something for a lonely weekend...

uschindler commented 10 months ago

There are more generation tasks in forbiddenapis: The deprecated signatures files can be generated, too, but you need to run with exactly that JDK...

uschindler commented 10 months ago

hi @hossman, do you need a new version for Solr urgently? I would then release a new version soon (after doing more tests with the usual candidates Lucene/Solr/TIKA/Opensearch).

hossman commented 10 months ago

no no, this is a nice to have. no urgency needed.

uschindler commented 5 months ago

Forbiddenapis v3.7 was released a minute ago.