Closed jonasjonker closed 3 years ago
Oh, in case you need the data. I used a publicly available data set from Qitta
Sorry to hear this. Strangely, this all works without issues on my machine and OS (also, the fix passed all tests without error). Could you show me the output of versioninfo()
? Also, it would be helpful if you could re-run the example without parallel workers so we can get a more interpretable error message.
One more thing: after adding workers and doing @everywhere using FlashWeave
, could you run FlashWeave.workers_all_local()
directly to see if this results in the same crash?
I did. The same error I found earlier occurred after running FlashWeave.workers_all_local()
versioninfo()
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Ryzen 5 3600 6-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, znver2)
Environment:
JULIA_EDITOR = atom -a
JULIA_NUM_THREADS = 6
FlashWeave.workers_all_local()
On worker 2:
UndefVarError: #55#56 not defined
deserialize_datatype at /opt/julia/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:1252
handle_deserialize at /opt/julia/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:826
deserialize at /opt/julia/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:773
handle_deserialize at /opt/julia/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:833
deserialize at /opt/julia/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:773 [inlined]
deserialize_msg at /opt/julia/usr/share/julia/stdlib/v1.5/Distributed/src/messages.jl:99
#invokelatest#1 at ./essentials.jl:710 [inlined]
invokelatest at ./essentials.jl:709 [inlined]
message_handler_loop at /opt/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:185
process_tcp_streams at /opt/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:142
#99 at ./task.jl:356
in top-level scope at Repos/Thesis/src/scripts/minimal_reproducible_example.jl:7
in workers_all_local at FlashWeave/9pt8o/src/misc.jl:96
in remotecall_fetch at stdlib/v1.5/Distributed/src/remotecall.jl:421
in #remotecall_fetch#146 at stdlib/v1.5/Distributed/src/remotecall.jl:421
in remotecall_fetch at stdlib/v1.5/Distributed/src/remotecall.jl:386
in #remotecall_fetch#143 at stdlib/v1.5/Distributed/src/remotecall.jl:394
How do I re-run the code without without parallel workers? The bug only occurs when I run this:
using Distributed
addprocs(1)
@everywhere using FlashWeave
If i just do using FlashWeave
there is no problem.
You may have forgotten to copy the first line of versioninfo()
, what julia version are you on? Anyways, I can't replicate this on either 1.3, 1.5 or 1.6beta. It's also weird this only occurs on master since the fix (or any other recent commit) hasn't touched workers_all_local()
at all. Could you perhaps do
julia> using Distributed
julia> addprocs(1)
julia> @everywhere println(gethostname())
and then
remotecall_fetch(()->gethostname(), 2)
just to narrow the options down.
After restarting my ide (atom) I can't reproduce my error anymore.
VERSION # v"1.5.3"
addprocs(1)
@everywhere println(gethostname())
remotecall_fetch(()->gethostname(), 2) # "LDTett"
However, now I find that I cannot reliably reproduce the error. I made separate project.toml files to make a minimal reproducible example. But now I find that the error occurred only after loading an env:
Pkg> activate env
However, then I added FlashWeave#master
to my default environment and now I cannot reproduce the error within the environment anymore...
Maybe this is not a FlashWeave problem.. I'm sorry.
I'll let you know what the problem was/is when/if I found the cause.
No problem, but good to know that this seems to be a more obscure (and perhaps rare) bug. In any case, if the line remotecall_fetch(()->gethostname(), 2)
doesn't work, this really sounds like a bug in Distributed or Julia itself and would be worth reporting in their repositories.
Yes I will report it once I figure out how to reliably reproduce it.
Thank you for your time
Thanks for your effort, feedback like this is very valuable!
Hey Janko,
I'm afraid your fix for #21 introduced a bug.
before
after