CPU Quantization Specification Error (DQ Parameter?)

RazeBerry commented 7 months ago

When I use DQ parameter in Stable-ts in the latest version, it without fails invokes: "RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine" And I have since then asked on PyTorch Github issues and that is apparently because in every code specifying linear engine in ARM environment:

torch.backends.quantized.engine = 'qnnpack'

qconfig = get_default_qconfig('qnnpack')  # or 'fbgemm'

The first line must be included along with the second line, otherwise it would always throw the Runtime error. I am hoping this would be fixed. Thank you ! Also I am not sure if this is a standalone fix of the problem or a part of the bigger problem with PyTorch.

RazeBerry commented 7 months ago

Small update on the issue, for some reason when I use DQ and runs it without error, ironically the speed of transcription becomes ~40% slower. Medium on M2 Pro CPU transcribes about ~1.7sec/s without DQ, 1sec/s with DQ.

jianfch commented 7 months ago

There are many factors that can cause the model to running slower with DQ. The overhead can make it run slower (e.g. DQ only ran faster for the large models in my testing). The slower start-up can also make it seem slower for short audio tracks. But DQ should run with a smaller memory footprint than normal.

RazeBerry commented 7 months ago

@jianfch I have submitted a pull request which fixes it, please review: https://github.com/jianfch/stable-ts/pull/341

jianfch / stable-ts

CPU Quantization Specification Error (DQ Parameter?) #338