kohya-ss / sd-scripts

Apache License 2.0
5.29k stars 877 forks source link

[Bug] No operator for 'memory_efficient_attention_forward' #412

Open usamaa-saleem opened 1 year ago

usamaa-saleem commented 1 year ago
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(200, 9126, 1, 64) (torch.float32)
     key         : shape=(200, 9126, 1, 64) (torch.float32)
     value       : shape=(200, 9126, 1, 64) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`flshattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 64
steps:   0%|                                                                                                                                                        | 0/98800 [56:23<?, ?it/s]
usamaa-saleem commented 1 year ago

The GPU i am using is A100

sdbds commented 1 year ago

i had met same problem when use conda cudatoolkit when i use pip install torch+cu,it worked. maybe you can try this.

liuchenbaidu commented 1 year ago

same

usamaa-saleem commented 1 year ago

pip install torch+cu

What's the cli exactly?

usamaa-saleem commented 1 year ago

@kohya-ss can you guide me on how to solve this? Need it to be done asap

NotImplementedError: No operator found for `memory_efficient_attention_forward`
with inputs:
     query       : shape=(200, 9126, 1, 64) (torch.float32)
     key         : shape=(200, 9126, 1, 64) (torch.float32)
     value       : shape=(200, 9126, 1, 64) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`flshattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 64
steps:   0%|                                          | 0/98800 [56:21<?, ?it/s]
torvinx commented 1 year ago

@kohya-ssМожете ли вы подсказать мне, как это решить? Нужно сделать как можно быстрее

NotImplementedError: No operator found for `memory_efficient_attention_forward`
with inputs:
     query       : shape=(200, 9126, 1, 64) (torch.float32)
     key         : shape=(200, 9126, 1, 64) (torch.float32)
     value       : shape=(200, 9126, 1, 64) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`flshattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 64
steps:   0%|                                          | 0/98800 [56:21<?, ?it/s]

same problem

YOlegY commented 1 year ago

Exactly same problem but i running kohya_ss on CPU-only (no GPU).What setting do i missed?