Trivial C optimizations: eliminating unnecessary double promotion, division, and modulo operations

grendell commented 4 months ago

Tested in conjunction with https://github.com/jakubcerveny/gilbert/pull/13.

Even with gcc's -O3 flag from the Makefile, I was able to see the following savings on a 2020 M1 MacBook Air. These results omitted the printf per step to only measure the actual algorithm computation and are averaged over three executions each.

./gilbert d2xy 490 490
- before: 49.86 ms
- after: 37.34 ms
./gilbert d2xyz 490 490 490
- before: 21.19 s
- after: 17.52 s

grendell commented 4 months ago

cc @abetusk

abetusk commented 4 months ago

The main optimization is to take out the (costly) floor functions. For example here's a diff of grendell's change:

diff --git a/ports/gilbert.c b/ports/gilbert.c
index af22c4d..10fa13f 100644
--- a/ports/gilbert.c
+++ b/ports/gilbert.c
@@ -216,16 +216,16 @@ int gilbert_d2xy_r(int dst_idx, int cur_idx,
   }

   // floor function
-  ax2 = (int)floor((double)ax/2.0);
-  ay2 = (int)floor((double)ay/2.0);
-  bx2 = (int)floor((double)bx/2.0);
-  by2 = (int)floor((double)by/2.0);
+  ax2 = ax >> 1;
+  ay2 = ay >> 1;
+  bx2 = bx >> 1;
+  by2 = by >> 1;

...

Note that on my system, and presumably grendell's system, ax >> 1 and (int)floor((double)ax/2.0) are equivalent, whereas ax / 2 (in C) is not. For example:

(int)floor((double)-11/2.0) == -6
-11 >> 1 == -6
-11 / 2 == -5

During the port, I incorrectly converted the Python // 2 operation to C's / 2, got paranoid when it failed and used the (costly) floor function, with the appropriate decoration, in its stead.

All this is just some exposition on my part and I had hoped to give some context.

jakubcerveny / gilbert

Trivial C optimizations: eliminating unnecessary double promotion, division, and modulo operations #14