Performance HPy/piconumpy/GraalPython microbenchmarks

paugier commented 2 years ago

I plan to write a blog note on few good news regarding performance for numerical Python since I think interesting things are happening in this field.

One part should be dedicated to HPy and GraalPython, in particular based on the microbenchmarks on low level Python code using piconumpy.

For this note, I think I could focus on the performance obtained for this function:

def cort(s1, s2):
    num = 0.0
    sum_square_x = 0.0
    sum_square_y = 0.0
    for t in range(len(s1) - 1):
        slope_1 = s1[t + 1] - s1[t]
        slope_2 = s2[t + 1] - s2[t]
        num += slope_1 * slope_2
        sum_square_x += slope_1 * slope_1
        sum_square_y += slope_2 * slope_2
    return num / (sqrt(sum_square_x * sum_square_y))

I compared to the performance obtained for the same function with Julia. Interestingly, GraalPython 3.8 is as fast as with Julia for lists and "only" 7 times slower with piconumpy arrays. It means that GraalPython is able to accelerate low level code written with extensions implemented with HPy, which is great news. As I understand it, the GraalPython JIT considers together the LLVM bitcode corresponding to the Python and the C code, which is clearly the good thing to do for this case.

7 times slower is much much better than everything else for low level Python code written with extensions, but it is still a bit too slow for some applications.

I wanted to know if it was possible to understand better the performance difference between Julia/GraalPython with list and GraalPython with piconumpy/HPy. For example, in Julia, it is possible to dump the LLVM IR, is it possible to get something similar with GraalPython?

msimacek commented 2 years ago

Hi @paugier. We usually use IGV (can be used for free for testing, evaluation and development of non-production applications) to analyze how things get compiled. You can dump the graph using -Dgraal.dump=Truffle:1 or --vm.Dgraal.dump=Truffle:1 depending how you launched graalpython. I usually look at "Before phase lowering" graph. You can look at the node source positions to see which line of Python and Java code generated given IR node. There is also an option to dump the generated assembly.

msimacek commented 2 years ago

I believe I answered your question, so let's close the issue.

oracle / graalpython

Performance HPy/piconumpy/GraalPython microbenchmarks #255