Open GoogleCodeExporter opened 8 years ago
I've passed this on to the Dataflow team; I'll let you know when they get back.
Original comment by z...@google.com
on 15 Apr 2016 at 1:25
Here's what I have from the Dataflow team:
"Refusing to split" is not an error. It seems that the workers were stuck when
reading from GBK, and the "refusing to split" errors are merely attempts by
Dataflow to parallelize the load. I'm not sure what the workers were stuck in.
Q: Why can a worker get stuck reading from GBK? (note the "unstarted" in
<unstarted in shuffle range [ShufflePosition(base64:AAAAAg),
ShufflePosition(base64:AAAAAw))> at AAAAAoAAAQ")
A: For any number of reasons (connectivity, timeouts, slow disks, etc.).
Besides the stack trace (which you can get from the http server on port
8081/threadz (e.g., ssh on the VM, and do curl http://localhost:8081/threadz)
what does the /var/log/dataflow/java-batch/boot-json.log file contain (on the
affected worker)?
-----------------------------------------------
Do you mind grabbing that stack trace and those log files?
Original comment by z...@google.com
on 15 Apr 2016 at 5:46
Thanks, is it possible/easy to ssh into a VM that's been created by Dataflow?
Original comment by emanu...@ziglioli.org
on 16 Apr 2016 at 11:19
I'm no expert on Dataflow and haven't done this myself, but it seems possible:
> You can view the VM instances for a given pipeline by using the Google Cloud
Platform Console. From there, you can use SSH to access each instance.
https://cloud.google.com/dataflow/faq#accessing-VMs
Looks like it'll be easiest from the Cloud Console.
Original comment by z...@google.com
on 19 Apr 2016 at 3:34
Original issue reported on code.google.com by
theb...@emanueleziglioli.it
on 15 Apr 2016 at 4:12