pulp-platform / quantlib

A library to train and deploy quantised Deep Neural Networks
Apache License 2.0
17 stars 7 forks source link

OPEN: Fixes for Transformer quantization and Deeploy export #12

Closed Scheremo closed 8 months ago

Scheremo commented 8 months ago

This PR fixes smaller issues with the Transformer quantization flow and network export to Deeploy.

Added

Changes

Fixed

Victor-Jung commented 8 months ago

Two questions out of curiosity:

Otherwise LGTM.

Scheremo commented 8 months ago

Two questions out of curiosity:

* Why was the `quant` field of the `meta` dict required to apply the replacement pass?

* Why would you keep an RQS that is performing an identity operation?

Otherwise LGTM.

The deal with the quant key / meta field in the OpTreeReplacementPass is that if this information is annotated prior to OpTreeReplacement, you'd like to have it afterwards as well; the code added just makes sure it also works if the information was not annotated before hand (since it's not really required).

Keeping an RQS preserves semantic information; in principle there could be an identity RQS after a convolution which might get removed; to Deeploy, this pattern would look like an unquantized convolution. If we decide we don't want to do identity RQS operations in deployment, we remove them during lowering in Deeploy.

Victor-Jung commented 8 months ago

Thanks for the details. I have no objection. Good to merge!