uber / h3

Hexagonal hierarchical geospatial indexing system
https://h3geo.org
Apache License 2.0
4.83k stars 459 forks source link

aarch64 performance fix: replace long double with double #790

Closed heshpdx closed 10 months ago

heshpdx commented 11 months ago

On aarch64 systems, we are seeing a significant overhead in libgcc - __multf3.

26.28%  h3geo  libgcc_s.so.1      [.] __multf3

This overhead is due to the usage of lroundl in coordijk.c. To prevent this overhead, we are proposing to replace long double type with double type and use lround instead of lroundl. This should have no impact on other systems than aarch64. We have verified the behavior matches on some test cases. For coherency sake, we also changed long double constants to double type.

This was found by the SPEC CPU committee, where h3 is being considered as a candidate benchmark for SPEC CPU v8 and undergoing testing across a wide variety of systems and compilers.

Feel free to squash these commits.

coveralls commented 11 months ago

Coverage Status

coverage: 98.657%. remained the same when pulling ef0a87ac3c8ff38f301bc8aa0cafdb2b35d154f8 on heshpdx:master into 017f81036628fd6bde0a0977078b4bc48bdad58e on uber:master.

grim7reaper commented 11 months ago

FWIW, h3o only uses double (since long double no longer exists in Rust), so it's likely possible to get rid of long double on every platform without impacting the correctness

isaacbrodsky commented 11 months ago

Thanks for the PR!

I remember removing some uses of long double before (I was checking if that was in the Git history and didn't see it, maybe it was before we open sourced the library?). I am pretty comfortable with removing remaining long double usage.

sahrk commented 11 months ago

Thanks for the PR!

I remember removing some uses of long double before (I was checking if that was in the Git history and didn't see it, maybe it was before we open sourced the library?). I am pretty comfortable with removing remaining long double usage.

Everything was long double when we started, until it became clear that was unnecessary. When we switched to double some of them must have been missed.

heshpdx commented 10 months ago

I measured a 40% performance uplift on an Ampere Altra QS80-30 on the benchmark cmdline below. Building with gcc-12 -O3, runtime dropped from 481s to 340s.

./h3geo 110,10,25,82 vertex,is_valid_cell,grid_disk_cells,grid_path_cells