Open kaushikcfd opened 4 years ago
Thanks for timing this! A few thoughts:
I'll further assume that it's time genuinely spent in isl and not in the wrapper. (Do you have any indication otherwise?)
From the discussing at inducer/islpy#28, it was established that ISL's performance depends on whether it was linked against GMP
or imath
(shipped with islpy). Running the same kernel with islpy linked with GMP gets a 3x speed-up! :)
Here's the corresponding log:
2897406 function calls (2299915 primitive calls) in 220.275 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
729855/286691 164.491 0.000 165.192 0.001 islpy/islpy/__init__.py:861(wrapper)
265 25.103 0.095 25.174 0.095 islpy/islpy/__init__.py:1014(obj_eliminate_except)
17736/16152 10.725 0.001 11.068 0.001 loopy/loopy/isl_helpers.py:527(_find_aff_dims)
68 3.318 0.049 8.032 0.118 loopy/loopy/symbolic.py:1884(get_access_range)
74 2.561 0.035 82.331 1.113 loopy/loopy/isl_helpers.py:614(dim_max_with_elimination)
74 2.357 0.032 82.178 1.111 loopy/loopy/isl_helpers.py:609(dim_min_with_elimination)
4632 1.929 0.000 7.909 0.002 islpy/islpy/__init__.py:1100(_align_dim_type)
1070 1.403 0.001 2.342 0.002 loopy/loopy/symbolic.py:1590(__init__)
973/897 0.937 0.001 1.296 0.001 loopy/loopy/symbolic.py:1525(aff_to_expr)
33697 0.732 0.000 0.732 0.000 {built-in method islpy._isl.from_aff}
46890 0.593 0.000 1.867 0.000 islpy/islpy/__init__.py:1096(_set_dim_id)
27169 0.508 0.000 0.871 0.000 islpy/islpy/__init__.py:850(wrapper)
3347 0.500 0.000 0.834 0.000 islpy/islpy/__init__.py:690(_number_to_expr_like)
22345 0.419 0.000 0.419 0.000 islpy/islpy/__init__.py:548(set_get_basic_sets)
10367 0.380 0.000 2.568 0.000 islpy/islpy/__init__.py:315(space_get_var_dict)
148 0.353 0.002 11.606 0.078 loopy/loopy/isl_helpers.py:550(_find_noninteracting_dims)
21703 0.311 0.000 0.311 0.000 islpy/islpy/__init__.py:805(val_to_python)
There's clearly more stuff to be fixed. (exploring if we are unnecessarily invoking ISL routines anywhere.)
cc @wence- (who may be interested in ways to speed up loopy, but see the license concerns at https://github.com/inducer/islpy/issues/28)
Post #149, #150 and taking suggestions from inducer/islpy#28, the codegen profile looks better:
3501511 function calls (2873484 primitive calls) in 71.862 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
30039 26.195 0.001 26.610 0.001 islpy/islpy/__init__.py:850(wrapper)
735171/248679 19.323 0.000 20.050 0.000 islpy/islpy/__init__.py:861(wrapper)
48 7.116 0.148 7.129 0.149 islpy/islpy/__init__.py:1014(obj_eliminate_except)
94 4.831 0.051 11.872 0.126 loopy/loopy/symbolic.py:1652(get_access_range)
21/2 2.394 0.114 54.000 27.000 loopy/loopy/codegen/loop.py:347(generate_sequential_loop_dim_code)
5073 2.009 0.000 8.444 0.002 islpy/islpy/__init__.py:1100(_align_dim_type)
1200 1.560 0.001 2.617 0.002 loopy/loopy/symbolic.py:1354(__init__)
1005/925 1.071 0.001 1.460 0.002 loopy/loopy/symbolic.py:1289(aff_to_expr)
We are still doing around 0.3M ISL calls which is not ideal, but I'm fine with closing this issue.
Let's leave it open for a bit. I think there's more we can do.
Profile guided optimization while compiling isl might modestly improve performance.
View the loopy kernel on Gist.
Profile log, as reported by
cProfile
, links 90% of the codegen time to islpy routines: