I've been trying to quantize the Linear and Convolutional layers to try and speed up model inference but I am getting mixed results, this is what I am doing at the moment:
quantized_model = torch.quantization.quantize_dynamic(
model,
{torch.nn.Linear, torch.nn.Conv1d}, # Add other layers as needed
dtype=torch.qint8,
inplace=True
)
I've been trying to quantize the Linear and Convolutional layers to try and speed up model inference but I am getting mixed results, this is what I am doing at the moment:
Does anyone have any advice?