performance degradation for data transfer between JVM and native in 0.3.0

i10416 commented 1 year ago

Considering

SimpleNativeCallBenchmarks.slincQSortWithCopyBack becomes 597864.321 -> 1210048.181 (x2)
SimpleNativeCallBenchmarks.slincQSortWithoutCopyBack becomes 611254.382 -> 963393.420 (x1.5)
SimpleNativeCallBenchmarks.slincQsortAllocCallbackForEachIteration is almost the same (from 1672473.620 to 1623060.091),

It seems there is a performance degradation for data transfer between 0.1.0 and 0.3.0.

slinc 0.1.1-110-7863cb, Scala 3.3.0-RC3

[info] NativeBenchmarks.ctimeJNI                                           avgt    5     5037.753 ±    150.762  ns/op
[info] NativeBenchmarks.ctimeSlinc                                         avgt    5     5805.548 ±    252.506  ns/op
[info] SimpleNativeCallBenchmarks.jniNativeQSort                           avgt    5     4093.148 ±    103.787  ns/op
[info] SimpleNativeCallBenchmarks.jniQSort                                 avgt    5   285657.481 ±   7850.867  ns/op
[info] SimpleNativeCallBenchmarks.slincQSortJVM                            avgt    5      190.156 ±      2.883  ns/op
[info] SimpleNativeCallBenchmarks.slincQSortWithCopyBack                   avgt    5   597864.321 ±  13374.750  ns/op
[info] SimpleNativeCallBenchmarks.slincQSortWithoutCopyBack                avgt    5   611254.382 ±  16272.430  ns/op
[info] SimpleNativeCallBenchmarks.slincQsortAllocCallbackForEachIteration  avgt    5  1672473.620 ± 463650.513  ns/op

slinc 0.3.0, Scala 3.3.0-RC3

[info] Benchmark                                                           Mode  Cnt        Score       Error  Units
[info] NativeBenchmarks.ctimeJNI                                           avgt    5     5080.173 ±   279.500  ns/op
[info] NativeBenchmarks.ctimeSlinc                                         avgt    5     6665.928 ±   242.225  ns/op
[info] SimpleNativeCallBenchmarks.jniNativeQSort                           avgt    5     4096.001 ±   161.264  ns/op
[info] SimpleNativeCallBenchmarks.jniQSort                                 avgt    5   284095.322 ±  4402.432  ns/op
[info] SimpleNativeCallBenchmarks.slincQSortJVM                            avgt    5      191.390 ±     3.029  ns/op
[info] SimpleNativeCallBenchmarks.slincQSortWithCopyBack                   avgt    5  1210048.181 ± 44536.967  ns/op
[info] SimpleNativeCallBenchmarks.slincQSortWithoutCopyBack                avgt    5   963393.420 ± 40853.952  ns/op
[info] SimpleNativeCallBenchmarks.slincQsortAllocCallbackForEachIteration  avgt    5  1623060.091 ± 61705.828  ns/op

Env:

Java 19.

https://github.com/i10416/bench/blob/main/bench-0.3/src/test/scala/SlincBenchmark.scala

i10416 commented 1 year ago

I confirmed that transfering array back and forth takes around 10 times longer in slinc 0.3.0 than slinc 0.1.1

https://github.com/i10416/bench/pull/5

markehammons commented 1 year ago

Yes this is expected. The current versions of Slinc have focused on simplifying and refractor the code. Once I've implemented all the major features of java foreign API, optimization will be more of a focus.

The current plan for optimization will be more rigorous than Slinc 0.1.1. In that, optimal code was only generated for calls that are entirely compiletime knowable. This is unrealistic with C and it's types being very dependent on the platform the program is running on.

Slinc going forward will generate the optimal code at runtime using runtime multistage programming. This enables the optimal code to be generated based on runtime information.

i10416 commented 1 year ago

Thanks! I'm looking forward to future optimization!

By the way, I feel Slinc is quite promising as Slinc will help interoperation between Scala JVM and Scala Native. In fact, it couldn't be easier to write Scala Native binding in Scala with Slinc!

PoC: https://github.com/i10416/slinc-examples/tree/main/scalanativeinterop

scala-interop / slinc

performance degradation for data transfer between JVM and native in 0.3.0 #156