Open flores-o opened 5 months ago
@flores-o I think you can see by the netron that there is an upcast layer in the file.
import torch
import torchvision.models as models
# Load a pre-trained model
model = models.resnet18(pretrained=True)
# Convert the model to half-precision (FP16)
model = model.half()
# Quantize the model parameters to FP16
quantized_model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.float16
)
# Export the quantized model to ONNX format
torch.onnx.export(
quantized_model,
torch.randn(1, 3, 224, 224).half(), # Input shape
"quantized_resnet.onnx",
input_names=["input"],
output_names=["output"],
opset_version=11,
example_outputs=torch.randn(1, 1000).half() # Provide example outputs for dynamic axes
)
https://github.com/microsoft/onnxruntime-inference-examples/blob/8fcc97e1e035d57ffdfd19b76732e3fc79d8c2a6/js/segment-anything/index.js#L21
Hi @guschmue, can you share the command used with the sam exporter tool to get this onnx file?
p.s. : I tried exporting the vit_b version with the sam exporter tool and got a larger onnx file (360 MB compared with yours 180 MB) that runs slower in browser with webgpu. Did you convert the model weights to mixed precision/ half precision before exporting it with sam_exporter?
Thank you