Closed wtwhite closed 1 month ago
A related wrinkle is that recordActivity()
, which is called not only to record an activity but also to association an entity with it, currently also assumes that the relevant target is stored in the tracked method's return value. I think sorting this out properly probably requires a redesign, or more time for me to understand what the intended semantics actually are, and how they differ from the current implementation, as the Right Thing to do isn't clear to me:
target
and targetIndex
fields to ActivityMapping
s, just like for EntityMapping
s? But this appears to (a) duplicate most of what's in EntityMapping
(so why have separate ActivityMapping
s at all?) and also (b) it's tricky to inject bytecode to handle EntityRef.ARG
at the callsite (since TTBOMK there's no random access into the stack, and calling a special utility method beforehand to make arguments them accessible as local variables erases them from the stack).EntityMapping
entries, which is possible now that EntityCreation
s also keep track of target
and targetIndex
? This seems nice, and avoids duplication. But it again calls into question why ActivityMapping
s exist at all, since everything can now be done with just EntityMapping
s.However, provided we only want to deal with activities that either return their associated entity (as is already handled) or create it using a constructor (i.e., if we don't care about EntityRef.ARG
), this is moot: the JVM happens to handles both cases in such a way that the stack looks the same after the method or constructor returns, i.e., the desired target object is on the top of the stack either way. IOW, we get lucky here and don't need to fix this properly just yet 🤪
First attempt to collect this
(f42a1e006fd6d579debbcd4e336b9e8b440ddeba) fails JVM verification because we try to do it too early, before it's been initialised (which I thought would be fine since the address wouldn't change):
--snip--
java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hc/client5/http/classic/methods/HttpGet.<init>(Ljava/net/URI;)V @11: invokestatic
Reason:
Type uninitializedThis (current frame, stack[0]) is not assignable to 'java/lang/Object'
Current Frame:
bci: @11
flags: { flagThisUninit }
locals: { uninitializedThis, 'java/net/URI' }
stack: { uninitializedThis, 'java/lang/String' }
Bytecode:
0000000: 1226 1228 2bb8 002e 2a12 28b8 0034 2a12
0000010: 022b b700 03b1
at nz.ac.canterbury.dataprovenancedemo.controllers.LibraryController.getHtmlUsingApacheHttpClient(LibraryController.java:190) ~[classes!/:na]
at nz.ac.canterbury.dataprovenancedemo.controllers.LibraryController.rateMovie(LibraryController.java:167) ~[classes!/:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
--snip--
This might be tricky -- I expect that the JVM won't update the type of the object from uninitializedThis
to the final object type until the constructor returns, which means that attempting to capture it just before a RETURN
will still be "too early". That's one way to interpret the relevant part of the JVM spec, and, e.g., this warns against doing so.
But if that were the case, then the following class should fail verification:
public class SomeClass {
public SomeClass() {
System.out.println("SomeClass ctor");
aMethod(this);
System.out.println("Returned from aMethod(this).");
}
public static void aMethod(SomeClass s) {
System.out.println("aMethod(" + s + ") called.");
}
}
But it compiles fine (relevant javap -v -p
output):
public org.example.SomeClass();
descriptor: ()V
flags: (0x0001) ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
7: ldc #3 // String SomeClass ctor
9: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
12: aload_0
13: invokestatic #5 // Method aMethod:(Lorg/example/SomeClass;)V
16: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
19: ldc #6 // String Returned from aMethod(this).
21: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
24: return
and runs fine, so maybe everything is OK verification-wise after the nested call to the java.lang.Object
constructor completes. In that case, it should be possible to safely grab this
with an aload_0
just before each return
🤞
Finally with 73953e7a124bc6cb3d2792dcefa2187a86b9a2c2 we are seeing the URL passed to HttpGet
's constructor get pulled down in the provenance data as associated entity data, although for some reason it appears twice with different IDs:
wtwhite@wtwhite-vuw-vm:~$ curl -i http://localhost:8080/prov/26
--snip--
{
"activities": [
{
"id": "7f6af131-d727-4573-aa02-1f3e77250bc1",
"type": "MakeHttpRequestApacheNew",
"endTime": "2024-05-29T09:40:48.178480Z"
}
],
"associatedEntities": [
{
"id": 589882250,
"type": "HttpRequestApacheGet",
"value": "https://app.veracity.homes/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4",
"target": {
"headers": [],
"method": "GET",
"path": "/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4",
"scheme": "https",
"authority": {
"userInfo": null,
"port": -1,
"hostName": "app.veracity.homes"
},
"version": null,
"requestUri": "/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4",
"entity": null,
"cancelled": false,
"config": null,
"aborted": false,
"uri": "https://app.veracity.homes/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4"
}
},
{
"id": 1082006511,
"type": "HttpRequestApacheGet",
"value": "https://app.veracity.homes/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4",
"target": {
"headers": [],
"method": "GET",
"path": "/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4",
"scheme": "https",
"authority": {
"userInfo": null,
"port": -1,
"hostName": "app.veracity.homes"
},
"version": null,
"requestUri": "/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4",
"entity": null,
"cancelled": false,
"config": null,
"aborted": false,
"uri": "https://app.veracity.homes/omar-notifications-main-menu.html?name=jens&movieId=35&stars=4"
}
}
]
},
--snip--
The problem is caused by the fact that we call an HttpGet
constructor that takes a String
, which itself builds a URI
from that and calls another constructor that takes a URI
, and captureTarget()
gets called on both. The return values of the two nested constructor calls are the same object, so even though recordActivity()
only gets called after the top-level constructor returns (because we don't inject them for calls between methods in the same class), it finds two copies in the AssociationCache
's cache somehow.
Changing the top-level constructor call to the version that takes a URI
directly eliminates the problem:
--snip--
{
"activities": [
{
"id": "a2f3e78c-ad67-4122-8ddb-8365235acf27",
"type": "MakeHttpRequestApacheNew",
"endTime": "2024-05-29T10:01:00.623219Z"
}
],
"associatedEntities": [
{
"id": 133914510,
"type": "HttpRequestApacheGet",
"value": "https://app.veracity.homes/omar-notifications-main-menu.html?name=jens&movieId=47&stars=4",
"target": {
"headers": [],
"method": "GET",
"path": "/omar-notifications-main-menu.html?name=jens&movieId=47&stars=4",
"scheme": "https",
"authority": {
"userInfo": null,
"port": -1,
"hostName": "app.veracity.homes"
},
"version": null,
"requestUri": "/omar-notifications-main-menu.html?name=jens&movieId=47&stars=4",
"entity": null,
"cancelled": false,
"aborted": false,
"config": null,
"uri": "https://app.veracity.homes/omar-notifications-main-menu.html?name=jens&movieId=47&stars=4"
}
}
]
},
--snip--
I'm not yet sure what the best way to solve this in general would be, but for now we can just use this hack. (A different workaround would be to add a "descriptor": "(Ljava/lang/String;)V"
field to the entity-mapping
spec for HttpGet.<init>
in bind-jdbc.json
to only capture a target for the top-level constructor call, but then if we were to call the URI
constructor, the URI argument for this call would not be tracked.)
The reason 2 copies of the HttpGet
object get stored in the AssociationCache
's targetsToEntitiesCache
when the constructor that takes a String
is called is because each entity has an id
, which is always set to the hash value of the parameter used to create that entity -- and that hash value will be different for the 2 HttpGet
constructor calls, since one is a String
and the other is a URI
, even though they both return the same object!
Currently we assume that whenever a tracked entity creation occurs, the entity target must be the method's return value (
EntityRef.RETURN
), which is picked up by injecting a call tocaptureTarget()
before all non-void
RETURN
JVM instructions (ARETURN
, etc.). But this is not necessarily the case -- e.g., constructors (which have method name<init>
) are actually passed in the freshly-allocatedthis
as the 0-th parameter like a regular non-static method and have avoid
return, despite thex = new Foo()
syntax suggesting otherwise.This only considers entity creation for now, since entity propagation is not yet implemented: #27. Will involve changes to
approv
because of https://github.com/veracitylab/approv/issues/2.Needed for #26.