Open pranav-prakash opened 3 years ago
Found the source of the discrepancy, it's because the new quantizer constructs the inference session like
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
self.infer_session = onnxruntime.InferenceSession(self.augmented_model_path,
sess_options=sess_options,
providers=self.execution_providers)
whereas the old one used
self.infer_session = onnxruntime.InferenceSession(self.augmented_model_path, None)
i.e. difference in optimization level. Probably doesn't make a significant difference either way.
We need to pull the newer changes I pushed to upstream ORT.
And to backport
# Don't bother quantizing if none of the inputs are quantized as might reduce performance
if node.input[0] not in self.quantizer.quantized_value_map and \
node.input[1] not in self.quantizer.quantized_value_map:
return super().quantize()
# Check whether all inputs are floats, so we can quantize it
can_quantize = self.quantizer._is_valid_quantize_value(node.output[0]) and \
self.quantizer._is_valid_quantize_value(node.input[0]) and \
self.quantizer._is_valid_quantize_value(node.input[1])
in qlinearbinaryop
The upstream quantizer sees changes almost every week. I don't have the bandwidth to keep track of all the divergences, but whoever takes this over should basically first diff the current quantizer with the version of upstream that was there at that time (look at the commit log to find the last bump to upstream), then copy over all those changes.
Also useful: QLinearConcat
support in https://github.com/microsoft/onnxruntime/pull/7341/files
Would be useful for maskrcnn. (plus the assoicated NHWC transform support (see upstream PR onnxruntime/pull/7587
)
The quantizer we currently have (in
systolic_runner/quantization
) was forked long time ago and has since diverged significantly from upstream. Now this isn't really a bug per se since it works fine (and has been tested/tweaked on all the models we currently need). However, lagging so far behind means we don't get any of the new goodies they upstream like the new histogram-based range calculations (instead of just min/max) or any of the fancy things like the equalization algorithm.In 5458ce99b (branch
merge_upstream_quantizer
) I've ported most of the changes I used forint8
, and I tested withresnet50
and it works (for some reason the calculated scale factors are ever-so-slightly different. I'll try and find out why, but it's probably just a difference in formula used somewhere).However, I'm hesitant to actually merge it into the main branch since this hasn't been used as much as the older implementation, so things will probably break. In particular, I'm worried about more esoteric models like BERT and mask-rcnn.