Closed PhrygianGates closed 1 year ago
A few comments:
reshape(@view(res[1:m*n]), m, n)
. This will share the memory with res. You should be able to have a single allocation only when you use BenchmarkTools.Merging #11 (c7ee211) into main (69313bd) will decrease coverage by
20.12%
. The diff coverage is0.00%
.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## main #11 +/- ##
===========================================
- Coverage 98.97% 78.86% -20.12%
===========================================
Files 1 1
Lines 98 123 +25
===========================================
Hits 97 97
- Misses 1 26 +25
Impacted Files | Coverage Δ | |
---|---|---|
src/DifferentiableFactorizations.jl | 78.86% <0.00%> (-20.12%) |
:arrow_down: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Also please turn the notebook into a test file in the tests folder.
I used the @time from Julia original package. Now I turn to BenchmarkTools and only use one allocation, but the result shows that for loop version is much slower than the matrix multiplication version. I guess it's because the matrix multiplication is much more optimized than our simple for loop?
using DifferentiableFactorizations, LinearAlgebra, BenchmarkTools
A = rand(5, 3)
x = diff_qr(A)
(; Q, R) = x
@benchmark begin
vcat(
vec(UpperTriangular(Q' * Q) + LowerTriangular(R) - I - Diagonal(R)),
vec(Q * R - A),
)
end
@benchmark begin
m, n = size(A, 1), size(A, 2)
res = similar(A, m + n, n)
res_A = view(res, 1:m, :)
res_R = view(res, m+1:m+n, :)
copyto!(res_A, A)
fill!(res_R, 0)
for i in 1:min(m, n)
res_R[i, i] = 1
end
for i in 1:m
for j in 1:n
for k in 1:n
res_A[i, j] -= Q[i, k] * R[k, j]
if i == 1 && j < k
res_R[k, j] += R[k, j]
end
if j <= k
res_R[j, k] -= Q[i, k] * Q[i, j]
end
end
end
end
vec(res)
end
I'll merge it into test later. Thanks!
The code you benchmark should always be inside a function. Also try multiple matrix sizes.
Yeah, this time I only test the function. Besides I tried different sizes, and found that our for loop implementation is faster than the matrix multiplication version when the size is small, and slower in the cases of large matrices. These are two typical tests and there are some more tests in the notebook script. Do you think it's reasonable?
The original qr_conditions is
However, the test shows that its much slower than the following for loop version
In terms of size(5, 3) matrices, that former one is
while the latter one is
The reason is that the fomer one access each elements of the matrices for many times, which is redundant. So, we consider rewrite these conditions. But we currently use Zygote as our AD system, which doesn't support in-place operation. And that's the problem to be solved.