soot-oss / soot

Soot - A Java optimization framework
GNU Lesser General Public License v2.1
2.89k stars 711 forks source link

In instance methods, the first local should be a reference to the object #376

Closed marco-c closed 9 years ago

marco-c commented 9 years ago

According to https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.6.1: The Java Virtual Machine uses local variables to pass parameters on method invocation. On class method invocation, any parameters are passed in consecutive local variables starting from local variable 0. On instance method invocation, local variable 0 is always used to pass a reference to the object on which the instance method is being invoked (this in the Java programming language). Any parameters are subsequently passed in consecutive local variables starting from local variable 1.

But Soot generates:

  public synchronized java.util.TimeZone getInstance(java.lang.String);
    descriptor: (Ljava/lang/String;)Ljava/util/TimeZone;
    flags: ACC_PUBLIC, ACC_SYNCHRONIZED
    Code:
      stack=2, locals=2, args_size=2
         0: aload_1
         1: ifnonnull     33
         4: getstatic     #164                // Field HOME_ID:Ljava/lang/String;
         7: ifnonnull     29
        10: ldc           #4                  // String com.sun.cldc.util.j2me.TimeZoneImpl.timezone
        12: invokestatic  #93                 // Method java/lang/System.getProperty:(Ljava/lang/String;)Ljava/lang/String;
        15: putstatic     #164                // Field HOME_ID:Ljava/lang/String;
        18: getstatic     #164                // Field HOME_ID:Ljava/lang/String;
        21: ifnonnull     29
        24: ldc           #136                // String UTC
        26: putstatic     #164                // Field HOME_ID:Ljava/lang/String;
        29: getstatic     #164                // Field HOME_ID:Ljava/lang/String;
        32: astore_1
        33: iconst_0
        34: istore_0
        35: iload_0
        36: getstatic     #112                // Field zones:[Ljava/util/TimeZone;
        39: arraylength
        40: if_icmpge     73
        43: getstatic     #112                // Field zones:[Ljava/util/TimeZone;
        46: iload_0
        47: aaload
        48: checkcast     #133                // class com/sun/cldc/util/j2me/TimeZoneImpl
        51: getfield      #141                // Field ID:Ljava/lang/String;
        54: aload_1
        55: invokevirtual #89                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
        58: ifeq          67
        61: getstatic     #112                // Field zones:[Ljava/util/TimeZone;
        64: iload_0
        65: aaload
        66: areturn
        67: iinc          0, 1
        70: goto          35
        73: aload_1
        74: invokestatic  #159                // Method parseCustomTimeZone:(Ljava/lang/String;)Ljava/util/TimeZone;
        77: areturn

The class generated by Soot stores the value 0 in the first local.

This is the original:

  public synchronized java.util.TimeZone getInstance(java.lang.String);
    descriptor: (Ljava/lang/String;)Ljava/util/TimeZone;
    flags: ACC_PUBLIC, ACC_SYNCHRONIZED
    Code:
      stack=2, locals=4, args_size=2
         0: aload_1
         1: ifnonnull     33
         4: getstatic     #38                 // Field HOME_ID:Ljava/lang/String;
         7: ifnonnull     29
        10: ldc           #39                 // String com.sun.cldc.util.j2me.TimeZoneImpl.timezone
        12: invokestatic  #40                 // Method java/lang/System.getProperty:(Ljava/lang/String;)Ljava/lang/String;
        15: putstatic     #38                 // Field HOME_ID:Ljava/lang/String;
        18: getstatic     #38                 // Field HOME_ID:Ljava/lang/String;
        21: ifnonnull     29
        24: ldc           #41                 // String UTC
        26: putstatic     #38                 // Field HOME_ID:Ljava/lang/String;
        29: getstatic     #38                 // Field HOME_ID:Ljava/lang/String;
        32: astore_1
        33: iconst_0
        34: istore_2
        35: iload_2
        36: getstatic     #42                 // Field zones:[Ljava/util/TimeZone;
        39: arraylength
        40: if_icmpge     75
        43: getstatic     #42                 // Field zones:[Ljava/util/TimeZone;
        46: iload_2
        47: aaload
        48: checkcast     #5                  // class com/sun/cldc/util/j2me/TimeZoneImpl
        51: astore_3
        52: aload_3
        53: getfield      #4                  // Field ID:Ljava/lang/String;
        56: aload_1
        57: invokevirtual #43                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
        60: ifeq          69
        63: getstatic     #42                 // Field zones:[Ljava/util/TimeZone;
        66: iload_2
        67: aaload
        68: areturn
        69: iinc          2, 1
        72: goto          35
        75: aload_1
        76: invokestatic  #44                 // Method parseCustomTimeZone:(Ljava/lang/String;)Ljava/util/TimeZone;
        79: areturn
StevenArzt commented 9 years ago

What is the full signature of the method? Can you double-check that it's not static or supposed to be static? Are you suing ASM or Coffi as a front-end? What is your exact Soot command-line?

Have you checked whether the method generated by Soot still works in the JVM?

marco-c commented 9 years ago

What is the full signature of the method? Can you double-check that it's not static or supposed to be static?

The method is public synchronized TimeZone getInstance(String ID). It isn't static.

Are you suing ASM or Coffi as a front-end? What is your exact Soot command-line?

java -jar soot-trunk.jar -j2me -process-dir build -no-output-source-file-attribute -no-output-inner-classes-attribute -force-overwrite -include-all I guess I'm using the default front-end, I don't know what it is.

Have you checked whether the method generated by Soot still works in the JVM?

We're building a new JVM in JavaScript (https://github.com/mozilla/j2me.js). We're relying on the JVM specs, but I'm not running the class in the mainstream JVM. In our VM the method doesn't work, because we rely on the assumption that local variable 0 is always used to pass a reference to the object on which the instance method is being invoked.

StevenArzt commented 9 years ago

Now I understand your problem. Still, I think this is a misinterpretation of the specification. Java is stack-based and performs method invocations based on the stack alone. This is why the code generated by Soot works in Oracle's JVM as well as in the OpenJDK JVM and any other JVM I am aware of. On the caller-side, you need to get the stack right, that's all.

When the JVM then performs the call, it will take the values from the stack and put them into the variables for you. From inside the callee, you can then read the parameters from these variables. There is however not obligation for the caller to have any specific variable layout before making the call. The JVM will just give you this layout when it handles the call, nothing more.

In getInstance(), the first bytecode instruction exploits this: It reads the string parameter from variable 1. That's all fine as the JVM guarantees the variable layout to the callee. It is then free to overwrite variable 1 with whatever it likes.

Read on in the specification. We have put the target object and the arguments for the call on the stack and executed the call. Now we read:

The operand stack is empty when the frame that contains it is created.

So we're in a new context, we have a new stack frame, but the operand stack is empty. That's because the arguments all got popped off the stack when we made the call. They can now be accessed through the variables.

This also makes sense in the JVM architecture which is, after all, stack-based, not register-based or variable-based like Dalvik (the Android runtime).

One more example for a method call: http://cs.au.dk/~mis/dOvs/jvmspec/ref--35.html Here, they don't even use variables at the caller-side at all. Still, the code is perfectly fine. (The code is for Jasmin, a Java bytecode assembler).

marco-c commented 9 years ago

Wait, there's been a misunderstanding. We're relying on the assumption that in the callee (not in the caller) the first local variable is a reference to the object whose method is being called, because the JVM enforces this rule for instance methods.

For example, from http://cs.au.dk/~mis/dOvs/jvmspec/ref--35.html: Once a method has been located, invokevirtual calls the method. First, if the method is marked as synchronized, the monitor associated with objectref is entered. Next, a new stack frame structure is established on the call stack. Then the arguments for the method (which were popped off the current method's operand stack) are placed in local variables of the new stack frame structure. arg1 is stored in local variable 1, arg2 is stored in local variable 2 and so on. objectref is stored in local variable 0 (the local variable used for the special Java variable this). Finally, execution continues at the first instruction in the bytecode of the new method.

Now, the code generated by Soot is storing a value of type integer in the first local variable (_33: iconst0), while the first local variable is supposed to be of type reference. I can't find a perfectly clear statement in the specs, but I think that local variables are typed.

StevenArzt commented 9 years ago

Ah, that's what you meant. This means that we agree on how the JVM handles method calls. Now we're only talking about how the variables look like in the callee.

My interpretation of the spec is that when the execution enters the callee, the variables have the correct values taken from the original call arguments. That does not require the callee to maintain that layout. I do not see any restriction that forces the callee to never override the variables. At the end of the callee, the variables may have been overridden and the original arguments may be "lost". For the JVM, that's fine.

The remaining question is whether the variables are typed or not. To find out whether the JVM enforces typing here, it would be great to just try and run the class file generated by Soot on the Oracle JVM. If it fails bytecode verification, we know that we have a problem. Otherwise, let me assume that changing the type of a variable is legit. Here's why:

In Java 7, StackMap Frames were introduced to shift work from the bytecode verifier to the compiler. We're producing "old" class files (version flag set to less than 50.0), but I think it's safe to assume that rules as such haven't changed, only who's responsible for generating the type checking information. The documentation (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.4) tells us that we can have only one StackMapTable_attribute per code attribute. Inside there, we have a set of stack_map_frames. Every such frame has an offset, so we can have different type information on different regions of the bytecode. If I understand the spec correctly, this means that as long as we're not mixing types (see bad example above), we should be fine with changing types. Otherwise, one stack_map_frame for the full method would be sufficient.

marco-c commented 9 years ago

You've convinced me, I've fixed the problem on our end. Thank you!