timholy / ProgressMeter.jl

Progress meter for long-running computations
MIT License
694 stars 91 forks source link

is_threading is racy #232

Closed dnadlinger closed 2 months ago

dnadlinger commented 2 years ago

In is_threading, p.threads_used, a Vector of thread ids, is accessed and updated:

https://github.com/timholy/ProgressMeter.jl/blob/5826a75b44078e7bce240f003885f02ee4d88d84/src/ProgressMeter.jl#L463-L470

I don't think Vector is thread-safe, though – wouldn't accesses threads_used also need some form of locking?

IanButterworth commented 2 years ago

ah, true. It would be good to avoid a lock though, as the reason for this check is to avoid the need for a lock, given that locking got slower in recent julia versions

dnadlinger commented 2 years ago

Which use case are you most concerned about, performance-wise (that isn't covered by the nthreads() fast path)? I don't see an obvious way to avoid locking altogether, other than avoiding the lock by using a lock-free counter to track work altogether.

Right now, I am personally only really interested in @threads. For this, something like

function is_threading()
  Threads.nthreads() == 1 && return false 

  if Threads.threadid() > 1
    # Called from another thread; so certainly in a @threads region.
    return true
  end

  # On the main thread, so just check if we are in a @threads region.
  return ccall(:jl_in_threaded_region, Cint, ()) != 0
end

might do the trick. This wouldn't work for @spawn/…, though.

In my threaded programs, the progress meter instance is either going to be across threads anyway, or the situation will be such that the overhead from an uncontested lock is insignificant compared to the actual computation, though, so I'm not sure this problem needs to be solved. If it does, perhaps there could be an explicit thread safety flag/choice of variants (say, Progress(…, thread_safe=false)) for when the overhead matters to users?

IanButterworth commented 2 years ago

It's quite expensive to lock each iteration on tight loops. If a user has nthreads() > 1 and uses ProgressMeter on a non-threaded loop, I just felt the lock was worth avoiding.

It may be over-optimization though. Redesign proposals would definitely be welcomed by me!

dnadlinger commented 2 years ago

I've honestly not used ProgressMeter.jl enough yet to be able to judge what the correct design tradeoffs are. One option to lower the overhead would be to use lock-free atomics instead, but then printing the updates could presumably only be done from the main thread to make sure the updates are ordered correctly. Using Threads.SpinLock might also be cheaper.

danielwe commented 1 year ago

Workaround for anyone else who might be concerned about this:

p = Progress(n)
append!(p.threads_used, 1:Threads.nthreads())
# code using p

~As for a proper solution, I think you can get away with only locking in the exceptional case, as follows:~

function is_threading(p::AbstractProgress)
    Threads.nthreads() == 1 && return false
    length(p.threads_used) > 1 && return true
    return lock(lk) do
        if !in(Threads.threadid(), p.threads_used)
            push!(p.threads_used, Threads.threadid())
        end
        return length(p.threads_used) > 1
    end 
 end 

~That should only occur at most twice during execution.~

EDIT: scratch, that, it's not exceptional if Threads.nthreads() > 1 but you're only accessing p from a single thread.

IanButterworth commented 1 year ago

Yeah on a nthreads() > 1 session a simple hot loop becomes 25x slower with that lock, whereas it's currently at most 9x slower.

I don't yet have a decent design to fix this. Further suggestions appreciated!

IanButterworth commented 1 year ago

Suggestion by @vchuravy

Update the counter atomically and then try to acquire the lock and then you carry on if you didn't get it Sink the log into update progress so that you only acquire it when really needed And use a trylock. Worst case you miss an update

Also from @vtjnash

The fastest is usually a two-split design that stores the initial threadid or Task in one counter, and everything else that isn’t those uses a lock and a different counter

MarcMush commented 8 months ago

It happened in CI

Stacktrace ``` ProgressThreads tests: Error During Test at /home/runner/work/ProgressMeter.jl/ProgressMeter.jl/test/test_threads.jl:2 Got exception outside of a @test TaskFailedException nested task error: BoundsError: attempt to access MemoryRef{Int64} at index [1] Stacktrace: [1] getindex @ ./essentials.jl:882 [inlined] [2] iterate (repeats 2 times) @ ./array.jl:887 [inlined] [3] in @ ./operators.jl:1304 [inlined] [4] is_threading(p::ProgressThresh{Float64}) @ ProgressMeter ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:477 [5] lock_if_threading @ ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:484 [inlined] [6] #update!#18 @ ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:532 [inlined] [7] update!(p::ProgressThresh{Float64}, val::Float64) @ ProgressMeter ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:531 [8] macro expansion @ ~/work/ProgressMeter.jl/ProgressMeter.jl/test/test_threads.jl:58 [inlined] [9] (::var"#618#threadsfor_fun#155"{var"#618#threadsfor_fun#147#156"{Float64, ReentrantLock, UnitRange{Int64}}})(tid::Int64; onethread::Bool) @ Main ./threadingconstructs.jl:214 [10] #618#threadsfor_fun @ ./threadingconstructs.jl:181 [inlined] [11] (::Base.Threads.var"#1#2"{var"#618#threadsfor_fun#155"{var"#618#threadsfor_fun#147#156"{Float64, ReentrantLock, UnitRange{Int64}}}, Int64})() @ Base.Threads ./threadingconstructs.jl:153 ```

https://github.com/MarcMush/ProgressMeter.jl/actions/runs/7719366745/job/21042475069

vtjnash commented 8 months ago

The push! there looks very dangerous when used by multiple threads

lmiq commented 3 months ago

Just to mention that this issue is now breaking packages (one mine at least) in Julia 1.11.

Thus, a safe fix for this is necessary asap.

lmiq commented 3 months ago

One alternative could be add a "no_lock" keyword argument, as an opt in, with the appropriate explanation.

IanButterworth commented 3 months ago

Sounds like a good quickfix

lmiq commented 3 months ago

I think this update is a reasonable one, which solves the issue with very minor modifications: https://github.com/timholy/ProgressMeter.jl/pull/321

Relative to what is proposed in https://github.com/timholy/ProgressMeter.jl/issues/232#issuecomment-1546146597, what I do there is just define a new lock and just use the lock if the threads_used needs updating, which will not occur most than twice. I don't see any performance difference in the tight loop of the prog_perf example of the tests.

MarcMush commented 2 months ago

fixed by #322