Open tinatorabi opened 3 days ago
I also noticed this:
using QRupdate
using LinearAlgebra
using TimerOutputs
using InvertedIndices
m, n = 10000, 100
A = randn(m, n)
R = qr(A).R
Rin = deepcopy(R)
Ain = deepcopy(A)
for i in 30:50
println("======", i)
@timeit "dynamic" global A = A[:, Not(i)]
@timeit "dynamic" global R = qrdelcol(R, i)
@timeit "in-place" qrdelcol!(Ain, Rin, i)
end
print_timer()
julia> norm(Ain[:,1:79] - A)
0.0
julia> norm(Rin[1:79,1:79] - R)
720.1343678995846
Time Allocations
─────────────────────────────────────────────── Tot / % measured: 426s / 0.3% 1.20GiB / 63.9%
Section ncalls time %tot avg alloc %tot avg ───────────────────────────────────────────────────────── dynamic 237 962ms 87.3% 4.06ms 783MiB 99.6% 3.30MiB in-place 118 141ms 12.7% 1.19ms 3.24MiB 0.4% 28.1KiB ─────────────────────────────────────────────────────────
There is something inherently wrong with how R is being saved!
Hi Tina, thank you very much for looking into this.
I modified the qrdelcol!
tests locally to involve the way in which you measure the difference, but it still passes the tests:
@testset "qrdelcol!" begin
m = 100
A = randn(m,m)
Ain = copy(A)
Q, R = qr(A)
Qin, Rin = qr(A)
actual_size = m
for i in 100:-1:1
k = rand(1:i)
A = A[:,1:i .!= k]
R = qrdelcol(R, k)
qrdelcol!(Ain, Rin, k)
actual_size -= 1
@test norm(R - Rin[1:actual_size,1:actual_size]) < 1e-10
@test norm(A - Ain[:,1:actual_size]) < 1e-10
end
end
I am running the first code you sent. The critical row size seems to be 1441 (on my PC). From there on, I get errors on the column copy method as you mention. This seems not to be a problem on the column size also.
I found a problem on the matrix dimensions in the delcol! function, which I fixed in an upcoming commit (thanks again for checking). If you want to get it working quickly now just to use the code, this is my current coldel! function:
function qrdelcol!(A::AbstractMatrix{T}, R::AbstractMatrix{T}, k::Integer) where {T}
# Note that R is n x n
m, n = size(A)
mR,nR = size(R)
# Shift columns. This is apparently faster than copying views.
@inbounds for j in (k+1):n
R[:,j-1] .= @view R[:, j]
A[:,j-1] .= @view A[:, j]
end
A[:,n] .= zero(T)
R[:,n] .= zero(T)
This plus some minor modifications (not related to functionality) are what is coming up. I also noted that the function seems to be slower than the common del. I computed average times and mem usage with @btime, which seems to be more accurate, with the following simple script:
using QRupdate
using LinearAlgebra
using BenchmarkTools
for mm in [1000,2000,4000,10000, 20000,50000,100000]
#reset_timer()
m, n = mm, 100
A = randn(m, n)
R = qr(A).R
Rin = deepcopy(R)
Ain = deepcopy(A)
actual_size = n
i = 20
println("====== R ", i)
@btime $R = qrdelcol($R, $i)
println("====== Rin ", i)
@btime qrdelcol!($Ain, $Rin, $i)
end
which yields
====== R 20
12.974 μs (12 allocations: 154.38 KiB)
====== Rin 20
13.878 μs (0 allocations: 0 bytes)
====== R 20
13.351 μs (12 allocations: 154.38 KiB)
====== Rin 20
32.164 μs (0 allocations: 0 bytes)
====== R 20
13.277 μs (12 allocations: 154.38 KiB)
====== Rin 20
81.384 μs (0 allocations: 0 bytes)
====== R 20
13.373 μs (12 allocations: 154.38 KiB)
====== Rin 20
197.214 μs (0 allocations: 0 bytes)
====== R 20
13.315 μs (12 allocations: 154.38 KiB)
====== Rin 20
525.806 μs (0 allocations: 0 bytes)
====== R 20
14.081 μs (12 allocations: 154.38 KiB)
====== Rin 20
2.020 ms (0 allocations: 0 bytes)
====== R 20
13.963 μs (12 allocations: 154.38 KiB)
====== Rin 20
4.539 ms (0 allocations: 0 bytes)
I found this a bit surprising (but I'm glad of the 0 allocations). Still, note that the in-place 'delcol!' function also takes care of the 'A' matrix, which delcol
doesn't. This is a big load for the function, and indeed commenting out all calls to shifting the matrix 'A' as well yields the expected results:
====== R 20
13.768 μs (12 allocations: 154.38 KiB)
====== Rin 20
2.336 μs (0 allocations: 0 bytes)
====== R 20
13.727 μs (12 allocations: 154.38 KiB)
====== Rin 20
2.352 μs (0 allocations: 0 bytes)
====== R 20
13.400 μs (12 allocations: 154.38 KiB)
====== Rin 20
2.321 μs (0 allocations: 0 bytes)
====== R 20
14.196 μs (12 allocations: 154.38 KiB)
====== Rin 20
2.366 μs (0 allocations: 0 bytes)
I believe that the function should have the modification also for A, as modifying the R matrix is something you do only because A was modified. This usage aspect (I pasted only the last ones). I hope this closes the issue, and sorry for taking longer than expected.
Hi,
I encountered an unexpected bug in my tests after replacing qrdelcol with qrdelcol!, and I can't figure out why it's happening. It seems like qrdelcol! doesn't return the correct R matrix. I'll share the exact matrices in a JLD2 file and a script that reproduces the issue. I would greatly appreciate a prompt review of this problem.
matrices.jld2.zip
I think this is the problematic part:
Changing it to
solves the issue. I'm not entirely sure but I think the first version kinda overwrites matrix entries during the shifting, but it's more memory/cache efficient obviously.