fix: forcing deepspeed to use CPU for profiling FLOPS

timurcarstensen commented 3 weeks ago

Closes #153

Could you check that this works for you @rheasukthanker ? I tested this on juwels and didn't have any issues

rheasukthanker commented 3 weeks ago

Closes #153

Could you check that this works for you @rheasukthanker ? I tested this on juwels and didn't have any issues

I tried the fix, but from deepspeed.profiling.flops_profiler import get_model_profile fails for me on a gpu node on juwels with raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)") deepspeed.ops.op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s). Are you setting CUDA_HOME somewhere?

timurcarstensen commented 3 weeks ago

No, I’m just importing the GCC and CUDA modules and then everything runs just fine for me. Could you share your setup with me?

On Friday 25 October 2024, Rhea Sukthanker @.***> wrote:

Closes #153 https://github.com/whittle-org/whittle/issues/153

Could you check that this works for you @rheasukthanker https://github.com/rheasukthanker ? I tested this on juwels and didn't have any issues

I tried the fix, but from deepspeed.profiling.flops_profiler import get_model_profile fails for me on a gpu node on juwels. , still fails for me with raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)") deepspeed.ops.op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s). Are you setting CUDA_HOME somewhere?

— Reply to this email directly, view it on GitHub https://github.com/whittle-org/whittle/pull/154#issuecomment-2438297139, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXGDRTCUXOMDSY3IC5XITLZ5JYHFAVCNFSM6AAAAABQTSVCXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZYGI4TOMJTHE . You are receiving this because you authored the thread.Message ID: @.***>

rheasukthanker commented 3 weeks ago

Thanks, so I realised I wasn't loading CUDA on the gpu (my other script still worked just fine). One thing I notice however is deepspeed autodetects cuda as an accelerator when using a gpu and the cpu as an accelerator when using only a cpu. I am not sure if we can somehow avoid deepspeed from autodetecting the gpu during import?

timurcarstensen commented 1 week ago

I think there's no way to get around this since this happens when we import the profiler from DeepSpeed, I would merge this now :)

whittle-org / whittle

fix: forcing deepspeed to use CPU for profiling FLOPS #154