Open dugar-tarun opened 1 year ago
yeah, I think it failed probably because of npip policy. That might have prevented the code socket.gethostbyname(socket.gethostname())
from running successfully.
We'll check on the scenario with npip later.
Can you try with the job mode?
No luck with job mode either. It errors out at the same line:
socket.gethostbyname(socket.gethostname())
with a message "Name or service not known"
I am not able to initialize my cluster for ray using ray-on-aml version 0.2.4. I'm running a notebook in the Python 3.8 AzureML environment. Using the following piece of code:
While the compute instance initializes successfully, the ray_on_aml job fails in the cluster with the following error:
I have this entire setup within a VNet and all the compute resources have been created in the same subnet. Due to certain policies, I am forced to enable 'No Public IP'(npip) on my computes.
Could this be an issue due to my setup - npip or NSG? Or is it something to do with the library? Please help mitigate this.
Thank you