Closed grendell closed 4 months ago
cc @abetusk
The main optimization is to take out the (costly) floor functions. For example here's a diff of grendell'
s change:
diff --git a/ports/gilbert.c b/ports/gilbert.c
index af22c4d..10fa13f 100644
--- a/ports/gilbert.c
+++ b/ports/gilbert.c
@@ -216,16 +216,16 @@ int gilbert_d2xy_r(int dst_idx, int cur_idx,
}
// floor function
- ax2 = (int)floor((double)ax/2.0);
- ay2 = (int)floor((double)ay/2.0);
- bx2 = (int)floor((double)bx/2.0);
- by2 = (int)floor((double)by/2.0);
+ ax2 = ax >> 1;
+ ay2 = ay >> 1;
+ bx2 = bx >> 1;
+ by2 = by >> 1;
...
Note that on my system, and presumably grendell
's system, ax >> 1
and (int)floor((double)ax/2.0)
are equivalent, whereas ax / 2
(in C) is not. For example:
(int)floor((double)-11/2.0) == -6
-11 >> 1 == -6
-11 / 2 == -5
During the port, I incorrectly converted the Python // 2
operation to C's / 2
, got paranoid when it failed and used the (costly) floor function, with the appropriate decoration, in its stead.
All this is just some exposition on my part and I had hoped to give some context.
Tested in conjunction with https://github.com/jakubcerveny/gilbert/pull/13.
Even with gcc's
-O3
flag from the Makefile, I was able to see the following savings on a 2020 M1 MacBook Air. These results omitted the printf per step to only measure the actual algorithm computation and are averaged over three executions each../gilbert d2xy 490 490
./gilbert d2xyz 490 490 490