OPEN: Fixes for Transformer quantization and Deeploy export

Scheremo commented 8 months ago

This PR fixes smaller issues with the Transformer quantization flow and network export to Deeploy.

Added

skip_identity_rqs flag for the Integerizer; this flag allows to configure whether to skip Requantshift operators whose source and image epsilon are the same. Default behaviour does not change.

Changes

OpTreeReplacementPass now doesn't require the quant entry in the meta dict of nodes, but will still copy it if it is available.

Fixed

ApproximateSoftmaxPass now returns a new instance for every call, rather than sharing the same object.

Victor-Jung commented 8 months ago

Two questions out of curiosity:

Why was the quant field of the meta dict required to apply the replacement pass?
Why would you keep an RQS that is performing an identity operation?

Otherwise LGTM.

Scheremo commented 8 months ago

Two questions out of curiosity:

* Why was the `quant` field of the `meta` dict required to apply the replacement pass?

* Why would you keep an RQS that is performing an identity operation?

Otherwise LGTM.

The deal with the quant key / meta field in the OpTreeReplacementPass is that if this information is annotated prior to OpTreeReplacement, you'd like to have it afterwards as well; the code added just makes sure it also works if the information was not annotated before hand (since it's not really required).

Keeping an RQS preserves semantic information; in principle there could be an identity RQS after a convolution which might get removed; to Deeploy, this pattern would look like an unquantized convolution. If we decide we don't want to do identity RQS operations in deployment, we remove them during lowering in Deeploy.

Victor-Jung commented 8 months ago

Thanks for the details. I have no objection. Good to merge!

pulp-platform / quantlib