quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
338 stars 45 forks source link

[BUG] whisper_asr demo Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same #15

Closed MrRace closed 3 months ago

MrRace commented 4 months ago

Describe the bug I followed the guidance in the following document , installed: pip install "qai_hub_models[whisper_asr]", and after running the CLI demo: python -m qai_hub_models.models.whisper_asr.demo, I encountered an error:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

To Reproduce Steps to reproduce the behavior:

  1. pip install "qai_hub_models[whisper_asr]
  2. python -m qai_hub_models.models.whisper_asr.demo
cfasana commented 3 months ago

The issue is related to the fact that the model parameters are loaded on the GPU, while the input tensors are on the CPU. More specifically, when the encoder and decoder are loaded, the TorchNumpyAdapter is used to allow working with numpy inputs and outputs, instead of relying on torch tensors.

To solve the issue, the inputs should be moved to the GPU and then after inference the outputs should be moved back to the CPU.

This can be accomplished by slightly modifying the TorchNumpyAdapter __call__ method:

class TorchNumpyAdapter:
    def __init__(self, base_model: torch.jit.ScriptModule | torch.nn.Module):
        """
        Wraps torch models to use numpy input / outputs
        """
        assert isinstance(base_model, (torch.jit.ScriptModule, torch.nn.Module))
        self.base_model = base_model

    def __call__(self, *args) -> Tuple[np.ndarray, ...]:
        input_data = tuple(torch.from_numpy(t).cuda() for t in args)
        res = self.base_model(*input_data)
        if isinstance(res, torch.Tensor):
            output = res.detach().cpu().numpy()
        else:
            output = tuple(t.detach().cpu().numpy() for t in flatten(res))
        if isinstance(output, tuple) and len(output) == 1:
            return output[0]
        return output
super100pig commented 3 months ago

I don't think the current version of AI Hub has considered loading the model to the GPU. Maybe disabling the CUDA devices is an easier way: export CUDA_VISIBLE_DEVICES="" It is also work with the same error when compiling the models.

ingooooooo commented 3 months ago

The release v0.4.0r1 replaced the whisper_asr model by different variations:

For these, the model.pys now also work with CUDA enabled!

cfasana commented 3 months ago

@ingooooooo thanks for the updated.

I tested the compilation, profiling and inference of the 3 different model versions straight away on the AI Hub and I do not experience the issue reported by @MrRace anymore, so there is no further need of using export CUDA_VISIBLE_DEVICES="".

Moreover, also the other issues that I reported (https://github.com/quic/ai-hub-models/issues/19) are not experienced anymore.

I think that this issue can be considered solved.