Open Nithanaroy opened 2 years ago
@krfricke I did not find anything related to HDFS sync failures in /tmp/ray/session_latest/logs/ directory while the trail is running
We've updated the syncing logic to use pyarrow for syncing instead - please let us know if this resolved the problem.
That’s great, @krfricke How do I get this change? I can wait for the next release if it’s easier
You should try out pip install --pre -U ray
which should be in the 2.0 release :)
Hi: What settings are required for ray to read hdfs ?Look forward to your reply, thank you.
Search before asking
Ray Component
Ray Tune, Ray Clusters
Issue Severity
Medium: It contributes to significant difficulty to complete my task but I work arounds and get it resolved.
What happened + What you expected to happen
doesn’t not sync the data from workers to HDFS every 60s. It however syncs the logdir from the head node every 60s. But at the end of the experiment, it pushes all data from workers to HDFS as requested.
Versions / Dependencies
1.10.0 version of ray and tune everywhere
Reproduction script
Unfortunately I dont know of any open source way to reproduce multi worker problem like this. I started head using
ray start --head
and connected a bunch of workers to it usingray start --address=...
. And used Tune to launch an experiment.Anything else
I'm happy to jump into a debug session if it is easier for you
Are you willing to submit a PR?