huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
821 stars 61 forks source link

Errors when applied to Lumina-Next #269

Closed phil329 closed 1 month ago

phil329 commented 3 months ago

There is AssertionError when i tried the following codes.

from diffusers import LuminaText2ImgPipeline
from optimum.quanto import freeze, qfloat8, quantize

pipeline = LuminaText2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.float16).to("cuda")

quantize(pipeline.transformer, weights=qfloat8)
freeze(pipeline.transformer)

image = pipeline("ghibli style, a fantasy landscape with castles").images[0]

The qbytes_mm needs activations with the shape of 2 or 3. However, it has the ndim=4 during the inference of LuminaText2ImgPipeline

The primary logs are as followings: image

The AssertionError happens at the line of code

phil329 commented 3 months ago

I just simply add more cases on the code, and it works.

@torch.library.impl("quanto::qbytes_mm", "CUDA")
def qbytes_mm_impl_cuda(activations: torch.Tensor, weights: torch.Tensor, output_scales: torch.Tensor) -> torch.Tensor:

    assert activations.ndim in (2, 3, 4)
    in_features = activations.shape[-1]

    if activations.ndim == 2:
        tokens = activations.shape[0]
    elif activations.ndim == 3:
        tokens = activations.shape[0] * activations.shape[1]
    elif activations.ndim == 4:
        tokens = activations.shape[0] * activations.shape[1] * activations.shape[2]

    # original tokens
    # tokens = activations.shape[0] if activations.ndim == 2 else activations.shape[0] * activations.shape[1]
    ......
github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.