xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Apache License 2.0
768 stars 58 forks source link

RuntimeError: CUDA error #374

Open algorithmconquer opened 1 day ago

algorithmconquer commented 1 day ago

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

feifeibear commented 1 day ago

Please provide more context.

algorithmconquer commented 1 day ago

@feifeibear The run command is as: torchrun --nproc_per_node=8 examples/flux_example.py \ --model ${modelId} \ --height 1024 --width 1024 \ --pipefusion_parallel_degree 2 --ulysses_degree 2 --ring_degree 2 \ --num_inference_steps 28 --warmup_steps 0 --prompt "A small dog" \ --no_use_resolution_binning; The environment is: xfuser==0.3.5 diffusers==0.32.0.dev0 torch==2.4.0+cu124; The cuda device is:H20, 8cards

feifeibear commented 21 hours ago

I could not reproduce your error. Can you run it successfully with 1 gpu?