oracle / graalpython

A Python 3 implementation built on GraalVM
Other
1.2k stars 103 forks source link

Native performance is terrible with Python #350

Closed jasonzhengbj closed 10 months ago

jasonzhengbj commented 1 year ago

I used docker image ghcr.io/graalvm/graalvm-community (tag: 20.0.2-ol7-20230725) and install GraalPy 3.10.8 (GraalVM CE Native 23.0.1). I use simple code to test native performance. standalone application performance is slower.

Pythn code:

import datetime

if __name__ == '__main__':
    print(datetime.datetime.now())
    s = 0
    loop = 500
    for i in range(loop):
         for j in range(loop):
             for k in range(loop):
                s += 1
    print(f'loop: {s}')
    print(datetime.datetime.now())

graalpy spend time about 1.5s.

sh-4.2# graalpy loop.py
2023-08-07 09:01:32.211000
loop: 125000000
2023-08-07 09:01:33.708000

native spend time about 6.5s.

sh-4.2# graalpy -m standalone binary   --module loop.py  --output loop 
sh-4.2# ./loop
2023-08-07 09:39:43.672000
loop: 125000000
2023-08-07 09:39:50.213000
syan-cn commented 1 year ago

After updating to the dev version 23.1.0, I found that the binary file was always faster than the .py file for your simple code. It took about 1s to run for the binary file, while the .py file cost about 2s.

.py file:

2023-08-08 23:11:35.133000
loop: 125000000
2023-08-08 23:11:37.153000

binary file

2023-08-09 14:12:05.070000
loop: 125000000
2023-08-09 14:12:06.101000

And I found the time of generation was also faster. The total time spent on obtaining the binary file was about half of that before.

syan-cn commented 1 year ago

When I tested the native performance of pandas, graalpy made an executable file successfully. But there was an error when I run the file:

Traceback (most recent call last):
  File "/Library/Java/JavaVirtualMachines/graalvm-community-openjdk-21+30.1/Contents/Home/languages/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  File "/Library/Java/JavaVirtualMachines/graalvm-community-openjdk-21+30.1/Contents/Home/languages/python/lib/python3.10/runpy.py", line 86, in _run_code
  File "/private/var/folders/41/fffb7hhx5pj6ztn83jcb__n00000gn/T/tmp16n888e5/__main__.py", line 5, in <module>
  File "/Users/shanshu/SourceCode/graal/venv/lib/python3.10/site-packages/pandas/__init__.py", line 11, in <module>
  File "/Users/shanshu/SourceCode/graal/venv/lib/python3.10/site-packages/numpy/__init__.py", line 140, in <module>
  File "/Users/shanshu/SourceCode/graal/venv/lib/python3.10/site-packages/numpy/core/__init__.py", line 23, in <module>
  File "/Users/shanshu/SourceCode/graal/venv/lib/python3.10/site-packages/numpy/core/multiarray.py", line 10, in <module>
  File "/Users/shanshu/SourceCode/graal/venv/lib/python3.10/site-packages/numpy/core/overrides.py", line 6, in <module>
SystemError: Cannot load "libsulong-native.dylib". Internal library path not set

The test file:

import time

s = time.time()

import pandas as pd

for i in range(100):
    pd.DataFrame()

e = time.time()
print("total time:", e-s)

And I used the command with the option of virtual environment

python -m standalone native --module <py_file> --output <output_file> --venv ./venv
timfel commented 11 months ago

@syan-cn with the upcoming release using native extensions in the binaries works. I tried pandas, numpy, and pytorch.