timholy / ProgressMeter.jl

Progress meter for long-running computations
MIT License
704 stars 91 forks source link

Error in combination with @parallel and more than one worker #97

Open jullit31 opened 6 years ago

jullit31 commented 6 years ago

The following code works fine, as long as only one worker is spawned. I'm using version 0.5.5 and Julia 0.6.3 on Windows.

@everywhere using ProgressMeter
@everywhere n = 10^8
@everywhere p = Progress(n, 1)

sum = @parallel (+) for i = 1:n
    next!(p)
    rand(1:6)
end

println(sum/n)

Spawning just a single additional worker (via addprocs(1)) and running above code, produces the these errors:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (i
n their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6b6949c1 -- uv_write at /home/Administrator/buildbot/work
er/package_win64/build/deps/srccache/libuv-d8ab1c6a33e77bf155facb54215dd8798e13825d/src/win\stream.c
:126
while loading no file, in expression starting on line 0
uv_write at /home/Administrator/buildbot/worker/package_win64/build/deps/srccache/libuv-d8ab1c6a33e7
7bf155facb54215dd8798e13825d/src/win\stream.c:126
jl_uv_write at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildb
ot/worker/package_win64/build/src\jl_uv.c:414
uv_write at .\stream.jl:798
unsafe_write at .\stream.jl:832
print at .\strings\io.jl:122 [inlined]
printover at C:\Users\julli\.julia\v0.6\ProgressMeter\src\ProgressMeter.jl:311
unknown function (ip: 000000000C5AB596)
jl_call_fptr_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administra
tor/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
jl_call_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administ
rator/buildbot/worker/package_win64/build/src\julia_internal.h:358 [inlined]
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/b
uildbot/worker/package_win64/build/src\gf.c:1926
#updateProgress!#5 at C:\Users\julli\.julia\v0.6\ProgressMeter\src\ProgressMeter.jl:163
unknown function (ip: 000000000C5A7CCF)
jl_call_fptr_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administra
tor/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
jl_call_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administ
rator/buildbot/worker/package_win64/build/src\julia_internal.h:358 [inlined]
jl_invoke at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot
/worker/package_win64/build/src\gf.c:41
#next!#7 at C:\Users\julli\.julia\v0.6\ProgressMeter\src\ProgressMeter.jl:213
unknown function (ip: 000000000C5A6BF6)
jl_call_fptr_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administra
tor/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
jl_call_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administ
rator/buildbot/worker/package_win64/build/src\julia_internal.h:358 [inlined]
jl_invoke at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot
/worker/package_win64/build/src\gf.c:41
next! at C:\Users\julli\.julia\v0.6\ProgressMeter\src\ProgressMeter.jl:212
unknown function (ip: 000000000C5A6A1A)
jl_call_fptr_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administra
tor/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
jl_call_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administ
rator/buildbot/worker/package_win64/build/src\julia_internal.h:358 [inlined]
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/b
uildbot/worker/package_win64/build/src\gf.c:1926
macro expansion at Z:\Projekte\julia\d6_parallel.jl:7 [inlined]
#45 at .\distributed\macros.jl:162
unknown function (ip: 000000000C5A68F9)
jl_call_fptr_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administra
tor/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
jl_call_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administ
rator/buildbot/worker/package_win64/build/src\julia_internal.h:358 [inlined]
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/b
uildbot/worker/package_win64/build/src\gf.c:1926
jl_apply at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/
worker/package_win64/build/src\julia.h:1424 [inlined]
jl_f__apply at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildb
ot/worker/package_win64/build/src\builtins.c:426
#106 at .\distributed\process_messages.jl:268 [inlined]
run_work_thunk at .\distributed\process_messages.jl:56
macro expansion at .\distributed\process_messages.jl:268 [inlined]
#105 at .\event.jl:73
jl_call_fptr_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administra
tor/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
jl_call_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administ
rator/buildbot/worker/package_win64/build/src\julia_internal.h:358 [inlined]
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/b
uildbot/worker/package_win64/build/src\gf.c:1926
jl_apply at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/
worker/package_win64/build/src\julia.h:1424 [inlined]
start_task at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbo
t/worker/package_win64/build/src\task.c:267
Allocations: 4022019 (Pool: 4020728; Big: 1291); GC: 8
Worker 2 terminated.ERROR:
LoadError: ProcessExitedException()ERROR (unhandled task failure): EOFError: read end of file

Stacktrace:
 [1] try_yieldto(::Base.##298#299{Task}, ::Task) at .\event.jl:189
 [2] wait() at .\event.jl:234
 [3] wait(::Condition) at .\event.jl:27
 [4] wait(::Task) at .\task.jl:181
 [5] collect(::Base.Generator{Array{Task,1},Base.#wait}) at .\array.jl:470
 [6] preduce(::Function, ::Function, ::UnitRange{Int64}) at .\distributed\macros.jl:148
 [7] include_string(::String, ::String) at .\loading.jl:522
 [8] include_string(::Module, ::String, ::String) at C:\Users\julli\.julia\v0.6\Compat\src\Compat.jl
:88
 [9] (::Atom.##112#116{String,String})() at C:\Users\julli\.julia\v0.6\Atom\src\eval.jl:109
 [10] withpath(::Atom.##112#116{String,String}, ::String) at C:\Users\julli\.julia\v0.6\CodeTools\sr
c\utils.jl:30
 [11] withpath(::Function, ::String) at C:\Users\julli\.julia\v0.6\Atom\src\eval.jl:38
 [12] hideprompt(::Atom.##111#115{String,String}) at C:\Users\julli\.julia\v0.6\Atom\src\repl.jl:67
 [13] macro expansion at C:\Users\julli\.julia\v0.6\Atom\src\eval.jl:106 [inlined]
 [14] (::Atom.##110#114{Dict{String,Any}})() at .\task.jl:80
while loading Z:\Projekte\julia\d6_parallel.jl, in expression starting on line 157
martinholters commented 6 years ago

I think this has something to do with the IO object stored inside Progress. The following likewise crashes:

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> @everywhere n = 10

julia> @everywhere io = STDERR

julia> sum = @parallel (+) for i = 1:n
           println(io, "test")
           rand(1:6)
       end

Replacing io with STDERR in the loop makes the crash go away. Also, if you do p.output = STDERR inside the loop, the example you've posted works. I have close to zero experience with distributed Julia, so no idea whether this needs a fix in ProgressMeter or some other setting up than @everywhere p = ....

zsunberg commented 6 years ago

I don't think you want to do

@everywhere p = Progress(n, 1)

That will create a new progress meter on each proc.

See #109 for a possible solution.