mpf / QRupdate.jl

Column and row updates to "Q-less" QR decomposition, including stable least-squares solves
Other
22 stars 5 forks source link

qrdelcol! and qrdelcol yielding different results! #11

Open tinatorabi opened 3 days ago

tinatorabi commented 3 days ago

Hi,

I encountered an unexpected bug in my tests after replacing qrdelcol with qrdelcol!, and I can't figure out why it's happening. It seems like qrdelcol! doesn't return the correct R matrix. I'll share the exact matrices in a JLD2 file and a script that reproduces the issue. I would greatly appreciate a prompt review of this problem.

using QRupdate, JLD2, LinearAlgebra, InvertedIndices
@load "matrices.jld2" S_array R_array
S = S_array
R = R_array

## doing the following slicing to mimic how qrdelcol saves matrices
S_ = S_array[:,1:5]
R_ = R_array[1:5,1:5]

qrdelcol!(S, R, 3) #inplace

S_ = S_[:, Not(3)]
R_ = qrdelcol(R_, 3)

norm(S[:,1:4]-S_) # this gives back 0
norm(R[1:4,1:4] - R_) #!!!

norm( R_'*R_ - S_'*S_ ) # seems right
norm( R'*R - S'*S ) # weird

matrices.jld2.zip

I think this is the problematic part:

    # Shift columns. This is apparently faster than copying views.
    for j in (k+1):n, i in 1:m
        @inbounds R[i,j-1] = R[i, j]
        @inbounds A[i,j-1] = A[i, j]
    end

Changing it to

    for j in (k+1):n
        @inbounds for i in 1:m
            R[i, j-1] = R[i, j]   
            A[i, j-1] = A[i, j]  
        end
    end

solves the issue. I'm not entirely sure but I think the first version kinda overwrites matrix entries during the shifting, but it's more memory/cache efficient obviously.

tinatorabi commented 3 days ago

I also noticed this:

using QRupdate
using LinearAlgebra
using TimerOutputs
using InvertedIndices

m, n = 10000, 100
A = randn(m, n)
R = qr(A).R
Rin = deepcopy(R)
Ain = deepcopy(A)
for i in 30:50
    println("======", i)
    @timeit "dynamic" global A = A[:, Not(i)]
    @timeit "dynamic" global R = qrdelcol(R, i)
    @timeit "in-place" qrdelcol!(Ain, Rin, i)
end
print_timer()

julia> norm(Ain[:,1:79] - A)
0.0
julia> norm(Rin[1:79,1:79] - R)
720.1343678995846
                                     Time                           Allocations      

─────────────────────────────────────────────── Tot / % measured: 426s / 0.3% 1.20GiB / 63.9%

Section ncalls time %tot avg alloc %tot avg ───────────────────────────────────────────────────────── dynamic 237 962ms 87.3% 4.06ms 783MiB 99.6% 3.30MiB in-place 118 141ms 12.7% 1.19ms 3.24MiB 0.4% 28.1KiB ─────────────────────────────────────────────────────────

There is something inherently wrong with how R is being saved!

nabw commented 7 hours ago

Hi Tina, thank you very much for looking into this.

I modified the qrdelcol! tests locally to involve the way in which you measure the difference, but it still passes the tests:

@testset "qrdelcol!" begin
    m = 100
    A = randn(m,m)
    Ain = copy(A)
    Q, R = qr(A)
    Qin, Rin = qr(A)
    actual_size = m
    for i in 100:-1:1
        k = rand(1:i)
        A = A[:,1:i .!= k]
        R = qrdelcol(R, k)
        qrdelcol!(Ain, Rin, k)
        actual_size -= 1
        @test norm(R - Rin[1:actual_size,1:actual_size]) < 1e-10
        @test norm(A - Ain[:,1:actual_size]) < 1e-10
    end
end

I am running the first code you sent. The critical row size seems to be 1441 (on my PC). From there on, I get errors on the column copy method as you mention. This seems not to be a problem on the column size also.

I found a problem on the matrix dimensions in the delcol! function, which I fixed in an upcoming commit (thanks again for checking). If you want to get it working quickly now just to use the code, this is my current coldel! function:

function qrdelcol!(A::AbstractMatrix{T}, R::AbstractMatrix{T}, k::Integer) where {T}

    # Note that R is n x n
    m, n = size(A)
    mR,nR = size(R)

    # Shift columns. This is apparently faster than copying views.
    @inbounds for j in (k+1):n
        R[:,j-1] .= @view R[:, j]
        A[:,j-1] .= @view A[:, j]
    end
    A[:,n] .= zero(T)
    R[:,n] .= zero(T)

This plus some minor modifications (not related to functionality) are what is coming up. I also noted that the function seems to be slower than the common del. I computed average times and mem usage with @btime, which seems to be more accurate, with the following simple script:

using QRupdate
using LinearAlgebra
using BenchmarkTools

for mm in [1000,2000,4000,10000, 20000,50000,100000]
    #reset_timer()
    m, n = mm, 100
    A = randn(m, n)
    R = qr(A).R
    Rin = deepcopy(R)
    Ain = deepcopy(A)

    actual_size = n
    i = 20
    println("====== R ", i)
    @btime $R = qrdelcol($R, $i)
    println("====== Rin ", i)
    @btime qrdelcol!($Ain, $Rin, $i)
end

which yields

====== R 20
  12.974 μs (12 allocations: 154.38 KiB)
====== Rin 20
  13.878 μs (0 allocations: 0 bytes)
====== R 20
  13.351 μs (12 allocations: 154.38 KiB)
====== Rin 20
  32.164 μs (0 allocations: 0 bytes)
====== R 20
  13.277 μs (12 allocations: 154.38 KiB)
====== Rin 20
  81.384 μs (0 allocations: 0 bytes)
====== R 20
  13.373 μs (12 allocations: 154.38 KiB)
====== Rin 20
  197.214 μs (0 allocations: 0 bytes)
====== R 20
  13.315 μs (12 allocations: 154.38 KiB)
====== Rin 20
  525.806 μs (0 allocations: 0 bytes)
====== R 20
  14.081 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.020 ms (0 allocations: 0 bytes)
====== R 20
  13.963 μs (12 allocations: 154.38 KiB)
====== Rin 20
  4.539 ms (0 allocations: 0 bytes)

I found this a bit surprising (but I'm glad of the 0 allocations). Still, note that the in-place 'delcol!' function also takes care of the 'A' matrix, which delcol doesn't. This is a big load for the function, and indeed commenting out all calls to shifting the matrix 'A' as well yields the expected results:

====== R 20
  13.768 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.336 μs (0 allocations: 0 bytes)
====== R 20
  13.727 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.352 μs (0 allocations: 0 bytes)
====== R 20
  13.400 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.321 μs (0 allocations: 0 bytes)
====== R 20
  14.196 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.366 μs (0 allocations: 0 bytes)

I believe that the function should have the modification also for A, as modifying the R matrix is something you do only because A was modified. This usage aspect (I pasted only the last ones). I hope this closes the issue, and sorry for taking longer than expected.