Bump to quantizer from upstream

pranav-prakash commented 3 years ago

The quantizer we currently have (in systolic_runner/quantization) was forked long time ago and has since diverged significantly from upstream. Now this isn't really a bug per se since it works fine (and has been tested/tweaked on all the models we currently need). However, lagging so far behind means we don't get any of the new goodies they upstream like the new histogram-based range calculations (instead of just min/max) or any of the fancy things like the equalization algorithm.

In 5458ce99b (branch merge_upstream_quantizer) I've ported most of the changes I used for int8, and I tested with resnet50 and it works (for some reason the calculated scale factors are ever-so-slightly different. I'll try and find out why, but it's probably just a difference in formula used somewhere).

However, I'm hesitant to actually merge it into the main branch since this hasn't been used as much as the older implementation, so things will probably break. In particular, I'm worried about more esoteric models like BERT and mask-rcnn.

pranav-prakash commented 3 years ago

Found the source of the discrepancy, it's because the new quantizer constructs the inference session like

        sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
        self.infer_session = onnxruntime.InferenceSession(self.augmented_model_path,
                                                          sess_options=sess_options,
                                                          providers=self.execution_providers)

whereas the old one used

self.infer_session = onnxruntime.InferenceSession(self.augmented_model_path, None)

i.e. difference in optimization level. Probably doesn't make a significant difference either way.

pranav-prakash commented 3 years ago

We need to pull the newer changes I pushed to upstream ORT.

And to backport

        # Don't bother quantizing if none of the inputs are quantized as might reduce performance
        if node.input[0] not in self.quantizer.quantized_value_map and \
           node.input[1] not in self.quantizer.quantized_value_map:
            return super().quantize()

        # Check whether all inputs are floats, so we can quantize it
        can_quantize = self.quantizer._is_valid_quantize_value(node.output[0]) and \
                       self.quantizer._is_valid_quantize_value(node.input[0]) and \
                       self.quantizer._is_valid_quantize_value(node.input[1])

in qlinearbinaryop

pranav-prakash commented 3 years ago

The upstream quantizer sees changes almost every week. I don't have the bandwidth to keep track of all the divergences, but whoever takes this over should basically first diff the current quantizer with the version of upstream that was there at that time (look at the commit log to find the last bump to upstream), then copy over all those changes.

pranav-prakash commented 3 years ago

Also useful: QLinearConcat support in https://github.com/microsoft/onnxruntime/pull/7341/files

Would be useful for maskrcnn. (plus the assoicated NHWC transform support (see upstream PR onnxruntime/pull/7587)

ucb-bar / onnxruntime-riscv

Bump to quantizer from upstream #28