Open Blanchard-Zhu opened 8 months ago
Hey @zhubenchao , thanks for raising this issue. In your output, does the "PENDING" ever turn into a "RUNNING"? If not, then the job has not even started and is possibly blocked by something else. Maybe the GPU is not available or blocked by another process. If there is no GPU at all on your machine, you should see a Ray error, though, so that's seems to be not the case.
Training is slow and may be pending because the GPU may not be enabled. Could this be a problem due to excessive memory requirements?
Here is the code.
The following is the log.
Trial status: 1 PENDING Current time: 2024-01-14 08:33:15. Total running time: 42s Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Versions / Dependencies
ray: 2.9.0 torch: 2.1.2
Reproduction script
Issue Severity
High: It blocks me from completing my task.