leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.19k stars 270 forks source link

ggml error when using "--schedule kerras" with clblast #52

Open cwillu opened 12 months ago

cwillu commented 12 months ago
[INFO]  stable-diffusion.cpp:2830 - loading model from '/home/cwillu/ext/work/models/sd/cyberrealistic_v33-ggml-model-f16.bin'
[INFO]  stable-diffusion.cpp:2858 - model type: SD1.x
[INFO]  stable-diffusion.cpp:2866 - ftype: f16
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1012:xnack-'
ggml_opencl: device FP16 support: true
[INFO]  stable-diffusion.cpp:3090 - total params size = 1969.98MB (clip 235.01MB, unet 1640.46MB, vae 94.51MB)
[INFO]  stable-diffusion.cpp:3096 - loading model from '/home/cwillu/ext/work/models/sd/cyberrealistic_v33-ggml-model-f16.bin' completed, taking 0.64s
[INFO]  stable-diffusion.cpp:3121 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:3365 - condition graph use 248.59MB of memory: params 235.01MB, runtime 13.58MB (static 10.65MB, dynamic 2.93MB)
[INFO]  stable-diffusion.cpp:3365 - condition graph use 248.59MB of memory: params 235.01MB, runtime 13.58MB (static 10.65MB, dynamic 2.93MB)
[INFO]  stable-diffusion.cpp:4097 - get_learned_condition completed, taking 2.71s
[INFO]  stable-diffusion.cpp:4113 - start sampling
[INFO]  stable-diffusion.cpp:3753 - sampling using modified DPM++ (2M) method
ggml_opencl: ggml_cl_h2d_tensor_2d(queue, d_X, 0, src0, i03, i02, NULL) error -30 at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1505

I also get a similar error when using models that aren't f16 (i.e., f32, q4, etc), regardless of any other options, but that's probably a maybe-related-but-separate issue.

cwillu commented 12 months ago

Patching in a couple print statements in ggml-opencl.cpp:

    const void * x = (const void *) ((const char *) src->data + i2*nb2 + i3*nb3);
    if (nb0 == ts && nb1 == ts*ne0/bs) {
        fprintf(stderr, "clEnqueueWriteBuffer at %s:%d\n", __FILE__, __LINE__);
        fprintf(stderr, "  %p\n", x);
        fprintf(stderr, "  %ld\n", offset);
        fprintf(stderr, "  %ld (%ld * %ld)\n", ne1*nb1, ne1, nb1);
        err = clEnqueueWriteBuffer(queue, dst, CL_FALSE, offset, ne1*nb1, x, 0, NULL, ev);
        return err;
    }

and also rerunning with -v shows

clEnqueueWriteBuffer at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1355
  0x7f74f8fcb590
  0
  491520 (320 * 1536)
clEnqueueWriteBuffer at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1355
  0x7f74f8f53450
  0
  491520 (320 * 1536)
clEnqueueWriteBuffer at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1355
  0x7f74f8474b10
  0
  491520 (320 * 1536)
 [snip a bunch of repetitions]
clEnqueueWriteBuffer at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1355
  0x7f749333a890
  0
  204800 (320 * 640)
clEnqueueWriteBuffer at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1355
  0x7f7493308750
  0
  204800 (320 * 640)
clEnqueueWriteBuffer at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1355
  0x7f74822a82b0
  0
  655360 (4096 * 160)
ggml_opencl: ggml_cl_h2d_tensor_2d(queue, d_X, 0, src0, i03, i02, NULL) error -30 at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1511
Happenedtostumblein commented 12 months ago

OpenCL isn’t supported yet.

caofx0418 commented 11 months ago

I also get a similar error

caofx0418 commented 11 months ago
[INFO]  stable-diffusion.cpp:2830 - loading model from '/home/cwillu/ext/work/models/sd/cyberrealistic_v33-ggml-model-f16.bin'
[INFO]  stable-diffusion.cpp:2858 - model type: SD1.x
[INFO]  stable-diffusion.cpp:2866 - ftype: f16
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1012:xnack-'
ggml_opencl: device FP16 support: true
[INFO]  stable-diffusion.cpp:3090 - total params size = 1969.98MB (clip 235.01MB, unet 1640.46MB, vae 94.51MB)
[INFO]  stable-diffusion.cpp:3096 - loading model from '/home/cwillu/ext/work/models/sd/cyberrealistic_v33-ggml-model-f16.bin' completed, taking 0.64s
[INFO]  stable-diffusion.cpp:3121 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:3365 - condition graph use 248.59MB of memory: params 235.01MB, runtime 13.58MB (static 10.65MB, dynamic 2.93MB)
[INFO]  stable-diffusion.cpp:3365 - condition graph use 248.59MB of memory: params 235.01MB, runtime 13.58MB (static 10.65MB, dynamic 2.93MB)
[INFO]  stable-diffusion.cpp:4097 - get_learned_condition completed, taking 2.71s
[INFO]  stable-diffusion.cpp:4113 - start sampling
[INFO]  stable-diffusion.cpp:3753 - sampling using modified DPM++ (2M) method
ggml_opencl: ggml_cl_h2d_tensor_2d(queue, d_X, 0, src0, i03, i02, NULL) error -30 at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1505

I also get a similar error when using models that aren't f16 (i.e., f32, q4, etc), regardless of any other options, but that's probably a maybe-related-but-separate issue.

mybe memory exceeds! if you change the image height and image width to 64x 128x, it's ok!