stephengold / Libbulletjme

A JNI interface to Bullet Physics and V-HACD
https://stephengold.github.io/Libbulletjme/lbj-en/English/overview.html
Other
86 stars 10 forks source link

Cache the methods to free the native objects, per class #13

Closed dustContributor closed 2 years ago

dustContributor commented 2 years ago

Hi! While I was inspecting how the native interop worked in this library, I noticed that it worked in a peculiar way when freeing native objects.

There is the weak reference queue that gets emptied by a background thread, but I see it relies on reflection to call the respective freeing methods of the hierarchy of each object that gets freed.

I think it's quite easy to cache this process instead of reflecting and navigating the hierarchy on every single free call. I avoided using computeIfAbsent since it seems you deploy on Java 1.7. I know it's a bit defensive to use a ConcurrentHashMap for the cache since it always is called from the same thread but it seemed appropriate.

I haven't tested this extensively. Just ran the buid/tests on my Linux x64 machine and they passed.

As a note I thiiiink this whole process could be avoided by moving things around a bit, having each object implement a free method instead of relying of reflecting over a hierarchy of static methods. It's just that this change seemed much easier to implement in the meanwhile.

I'm a bit new to git, PRs and so on so sorry for any inconvenience.

stephengold commented 2 years ago

Thanks for your contribution. I believe this should work, but of course I'd want to be sure before integrating the change. And because this code runs asynchronously and interacts with Java garbage collection, it is difficult to test.

I also wonder whether the performance benefit justifies the added complexity of the cache. Did you attempt to measure the benefit?

Prior to v2.0, Minie used finalize() to free native objects. The quirks in the current design probably resulted from trying to minimize changes during that transition. I'm struggling to remember the details.

stephengold commented 2 years ago

Looking more closely at the diffs, I don't see where FreeingMethods.generate() gets invoked. Help me understand please!

dustContributor commented 2 years ago

Looking more closely at the diffs, I don't see where FreeingMethods.generate() gets invoked. Help me understand please!

That would be line 154, in the put call, here

        public static final Method[] of(Class<?> clazz) {
            Method[] methods = methodsByClass.get(clazz);
            if (methods == null) {
                methodsByClass.put(clazz, methods = generate(clazz));
            }
            return methods;
        }

I also wonder whether the performance benefit justifies the added complexity of the cache. Did you attempt to measure the benefit?

Not really. It's a bit difficult to test since it's done asynchronously from the main execution of the program, so what you should see is that particular thread working less.

I'll see if I can make a JMH benchmark comparing these but I'd be veeeeeeeery surprised if the hash lookup isn't faster than the reflection loop (I mean, you could be hitting exceptions every single time in some cases like the NoSuchMethodException, that can't possibly be good for perf).

I guess I could make the code a bit simpler by removing the FreeingMethods class and inlining the FreeingMethods.of call but I thought it'd be a nice idea to isolate the cache map in its own private class, just in case.

dustContributor commented 2 years ago

Here @stephengold, I created this benchmark to evaluate the difference between caching the static method calls vs reflecting on each of them every time.

https://gist.github.com/dustContributor/73bc2823158a833a08dbcc9294a16de3

The code that calls them is basically the same except for a few variations in which I wanted to see the boxing impact since the reflective call is forced to box the arguments in one way or the other.

I made a fictional hierarchy (classes at the end A, B, C, etc):

Benchmark           (listSize)  Mode  Cnt    Score    Error  Units
Main.cached               1000  avgt    4    0.032 ±  0.004  ms/op
Main.cached              10000  avgt    4    0.356 ±  0.009  ms/op
Main.cached             100000  avgt    4    3.020 ±  0.083  ms/op
Main.cached            1000000  avgt    4   33.456 ±  8.008  ms/op
Main.cachedArgsId         1000  avgt    4    0.031 ±  0.001  ms/op
Main.cachedArgsId        10000  avgt    4    0.345 ±  0.017  ms/op
Main.cachedArgsId       100000  avgt    4    3.108 ±  0.244  ms/op
Main.cachedArgsId      1000000  avgt    4   31.732 ±  1.724  ms/op
Main.cachedBoxedId        1000  avgt    4    0.032 ±  0.001  ms/op
Main.cachedBoxedId       10000  avgt    4    0.338 ±  0.049  ms/op
Main.cachedBoxedId      100000  avgt    4    3.424 ±  0.134  ms/op
Main.cachedBoxedId     1000000  avgt    4   35.194 ±  8.507  ms/op
Main.reflection           1000  avgt    4    0.292 ±  0.007  ms/op
Main.reflection          10000  avgt    4    3.220 ±  0.393  ms/op
Main.reflection         100000  avgt    4   32.177 ±  2.484  ms/op
Main.reflection        1000000  avgt    4  334.416 ± 24.856  ms/op

As you can see the reflection path can be 10x slower. With slight differences between how far you go pre-boxing the arguments for the reflective call (3 cases tested: boxing on each call, boxing the ID beforehand and boxing the args array directly).

stephengold commented 2 years ago

Nice work. I'm motivated now. I'll try to include this in the next release.

dustContributor commented 2 years ago

Sweet, thanks!