Some changes to _echelon_in_place_classical to improve the speed

tscrim commented 7 years ago

We can get faster echelon forms by

Directly calling the foo_c implementations instead of the Python foo methods.
Not retesting if certain entires are 0.
Only calling the get_unsafe once per loop and storing as a variable.

Component: performance

Author: Travis Scrimshaw

Branch/Commit: public/linear_algebra/some_speed_improvements-24123 @ aca06e9

Issue created by migration from https://trac.sagemath.org/ticket/24123

tscrim commented 7 years ago

Commit: aca06e9

tscrim commented 7 years ago

comment:1

With this branch:

sage: for _ in range(10):
....:     M = matrix({(2000-i,i): 1/i for i in range(1,2000,1)}, sparse=True)
....:     %time M.echelonize()
CPU times: user 112 ms, sys: 0 ns, total: 112 ms
Wall time: 81 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 81 ms
CPU times: user 84 ms, sys: 0 ns, total: 84 ms
Wall time: 81.5 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 79.2 ms
CPU times: user 84 ms, sys: 0 ns, total: 84 ms
Wall time: 81.1 ms
CPU times: user 84 ms, sys: 0 ns, total: 84 ms
Wall time: 82.6 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 80.6 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 81 ms
CPU times: user 84 ms, sys: 0 ns, total: 84 ms
Wall time: 80.5 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 80.6 ms

versus before

CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 78.4 ms
CPU times: user 76 ms, sys: 0 ns, total: 76 ms
Wall time: 78.3 ms
CPU times: user 76 ms, sys: 0 ns, total: 76 ms
Wall time: 78.7 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 78.5 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 78 ms
CPU times: user 76 ms, sys: 0 ns, total: 76 ms
Wall time: 77.9 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 79.5 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 78.6 ms
CPU times: user 76 ms, sys: 0 ns, total: 76 ms
Wall time: 78.1 ms
CPU times: user 80 ms, sys: 0 ns, total: 80 ms
Wall time: 79.6 ms

I am confused. Admittedly my example is artificial because it was trying to demonstrate how checking if an entry was 0 twice was going to be bad. Removing 3 did not change the timings. I would have thought skipping checking if certain entries were 0 would have a speed gain at the cost of some minor code duplication. Maybe this ticket can just be closed or recycled...

New commits:

aca06e9 Directly call the C functions and do not recheck for 0's.

tscrim commented 7 years ago

Branch: public/linear_algebra/some_speed_improvements-24123

tscrim commented 7 years ago

Description changed:

--- 
+++ 
@@ -1,5 +1,5 @@
 We can get faster echelon forms by

-- Directly calling the foo_c implementations instead of the Python foo methods.
-- Not retesting if certain entires are 0.
-
+1. Directly calling the foo_c implementations instead of the Python foo methods.
+2. Not retesting if certain entires are 0.
+3. Only calling the `get_unsafe` once per loop and storing as a variable.

sagemath / sage

Some changes to _echelon_in_place_classical to improve the speed #24123