Open vbsantos opened 2 months ago
Hi Vinícius, you can run execute script job_hf.py in local python (out of kubernetes)?
It appears to be a hardware limitation error:
2024-07-06 17:44:00,933 WARNING plan.py:567 -- Warning: The Ray cluster currently does not have any available CPUs. The Dataset job will hang unless more CPUs are freed up. A common reason is that cluster resources are used by Actors or Tune trials; see the following link for more details: https://docs.ray.io/en/master/data/dataset-internals.html#datasets-and-tune
in the Kubernetes Job you can increase the hardware specifications that Ray can use, remember to also increase the environment variables that limit Ray's use of the hardware
Issue: FileNotFoundError: [Errno 2] Failed to open local file 'dataset/lfw_multifaces-ingestion/Albert_Costa_0001.jpg'
Description
I am encountering a
FileNotFoundError
when trying to run thejob_lfw
job using Ray on a Kubernetes cluster. The error occurs after downloading the dataset (and doing some processing for the second time) when Ray tries to open a local file that apparently does not exist. I am new to the Python and Kubernetes ecosystem, so I apologize if this is a basic error.Details
kubectl apply -f kubernetes/job_lfw.yaml
.Error Details
The complete error is as follows:
Could you please help me understand why this error is occurring and how to resolve it? Any guidance or suggestions would be greatly appreciated.
Thank you in advance for your assistance and support.