Closed dnadlinger closed 2 months ago
ah, true. It would be good to avoid a lock though, as the reason for this check is to avoid the need for a lock, given that locking got slower in recent julia versions
Which use case are you most concerned about, performance-wise (that isn't covered by the nthreads() fast path)? I don't see an obvious way to avoid locking altogether, other than avoiding the lock by using a lock-free counter to track work altogether.
Right now, I am personally only really interested in @threads
. For this, something like
function is_threading()
Threads.nthreads() == 1 && return false
if Threads.threadid() > 1
# Called from another thread; so certainly in a @threads region.
return true
end
# On the main thread, so just check if we are in a @threads region.
return ccall(:jl_in_threaded_region, Cint, ()) != 0
end
might do the trick. This wouldn't work for @spawn
/…, though.
In my threaded programs, the progress meter instance is either going to be across threads anyway, or the situation will be such that the overhead from an uncontested lock is insignificant compared to the actual computation, though, so I'm not sure this problem needs to be solved. If it does, perhaps there could be an explicit thread safety flag/choice of variants (say, Progress(…, thread_safe=false)
) for when the overhead matters to users?
It's quite expensive to lock each iteration on tight loops. If a user has nthreads() > 1
and uses ProgressMeter on a non-threaded loop, I just felt the lock was worth avoiding.
It may be over-optimization though. Redesign proposals would definitely be welcomed by me!
I've honestly not used ProgressMeter.jl enough yet to be able to judge what the correct design tradeoffs are. One option to lower the overhead would be to use lock-free atomics instead, but then printing the updates could presumably only be done from the main thread to make sure the updates are ordered correctly. Using Threads.SpinLock might also be cheaper.
Workaround for anyone else who might be concerned about this:
p = Progress(n)
append!(p.threads_used, 1:Threads.nthreads())
# code using p
~As for a proper solution, I think you can get away with only locking in the exceptional case, as follows:~
function is_threading(p::AbstractProgress)
Threads.nthreads() == 1 && return false
length(p.threads_used) > 1 && return true
return lock(lk) do
if !in(Threads.threadid(), p.threads_used)
push!(p.threads_used, Threads.threadid())
end
return length(p.threads_used) > 1
end
end
~That should only occur at most twice during execution.~
EDIT: scratch, that, it's not exceptional if Threads.nthreads() > 1
but you're only accessing p
from a single thread.
Yeah on a nthreads() > 1
session a simple hot loop becomes 25x slower with that lock, whereas it's currently at most 9x slower.
I don't yet have a decent design to fix this. Further suggestions appreciated!
Suggestion by @vchuravy
Update the counter atomically and then try to acquire the lock and then you carry on if you didn't get it Sink the log into update progress so that you only acquire it when really needed And use a
trylock
. Worst case you miss an update
Also from @vtjnash
The fastest is usually a two-split design that stores the initial threadid or Task in one counter, and everything else that isn’t those uses a lock and a different counter
It happened in CI
https://github.com/MarcMush/ProgressMeter.jl/actions/runs/7719366745/job/21042475069
The push! there looks very dangerous when used by multiple threads
Just to mention that this issue is now breaking packages (one mine at least) in Julia 1.11.
Thus, a safe fix for this is necessary asap.
One alternative could be add a "no_lock" keyword argument, as an opt in, with the appropriate explanation.
Sounds like a good quickfix
I think this update is a reasonable one, which solves the issue with very minor modifications: https://github.com/timholy/ProgressMeter.jl/pull/321
Relative to what is proposed in https://github.com/timholy/ProgressMeter.jl/issues/232#issuecomment-1546146597, what I do there is just define a new lock and just use the lock if the threads_used
needs updating, which will not occur most than twice. I don't see any performance difference in the tight loop of the prog_perf
example of the tests.
fixed by #322
In
is_threading
,p.threads_used
, aVector
of thread ids, is accessed and updated:https://github.com/timholy/ProgressMeter.jl/blob/5826a75b44078e7bce240f003885f02ee4d88d84/src/ProgressMeter.jl#L463-L470
I don't think
Vector
is thread-safe, though – wouldn't accessesthreads_used
also need some form of locking?