xczhai commented 8 months ago

Context

The current PaddlePaddle quantization implementation is different from ONNX,.

Same

PaddlePaddle translates quantize_linear and dequantize_linear in the paddle frontend.
- https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/op/quantize_linear.cpp
- https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/op/dequantize_linear.cpp
ONNX just translates quantize_linear and dequantize_linear without any transformation in the onnx frontend.
- https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/onnx/frontend/src/op/quantize_linear.cpp
- https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/onnx/frontend/src/op/dequantize_linear.cpp

Difference

PaddlePaddle fuses the quantize_linear and dequantize_linear into FakeQuantize using a custom pass( https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp) but ONNX FE just doesn't.

It is hard to maintain the almost same logic. So need to refactor PaddlePaddle quantization like ONNX. Also, more patterns in the model will affect transformation performance.

What needs to be done?

Ignore the HALF_AWAY_FROM_ZERO round mode directly and aggressively. It is just for performance.
Remove or refactor the custom pass. LTP has done what custom pass(https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp) does. So, need to remove or refactor the custom pass. Prepare quantization pattern for LTP(Low Precision Transformation).
Refactor the PDPD FE to decrease the quantization pattern if need.

Example Pull Requests

Please refer to https://github.com/openvinotoolkit/openvino/pull/14834/ for more comments and background. test case: https://github.com/openvinotoolkit/openvino/pull/20689

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
What is OpenVINO?
User documentation
Paddle Slim Model: https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/cn/quantize.md

Contact points

@xczhai

Ticket

104434

siddhant-0707 commented 8 months ago

Hey @xczhai may I work on this...

yuxu42 commented 8 months ago

Hey @xczhai may I work on this...

@siddhant-0707 Sure, you can take it. Thanks!

xczhai commented 8 months ago

Hey @xczhai may I work on this...

@siddhant-0707 Yes. If you have any questions, please reach me. Thanks.

p-wysocki commented 7 months ago

Hi @siddhant-0707, are you still working on this issue or can I return it to be picked up by other contributors?

github-actions[bot] commented 5 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

SANTHOSH-MAMIDISETTI commented 5 months ago

hello @p-wysocki , I would like to have a look at it, and work over is , may I ?

SANTHOSH-MAMIDISETTI commented 5 months ago

if so , can I be assigned it @xczhai

gaganchapa commented 5 months ago

@p-wysocki , @mlukasze @xczhai or someone , can you please explain what exactly has to be done in this , I mean based on my understanding of the context given. The PaddlePaddle operator was implemented both in paddle frontend and ONNX frontend.

But there happens to be a little difference in the way how it is implemented , as

PaddlePaddle fuses the quantize_linear and dequantize_linear into FakeQuantize using a custom pass( https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp) but ONNX FE just doesn't.

So , PaddlePaddle has to be implemented in ONNX format? is that all or did I miss something , please add

xczhai commented 5 months ago

@gaganchapa Yes, you are right. I prepared a graph for you. Maybe it can help you understand it. PDFE_Quantization drawio

SANTHOSH-MAMIDISETTI commented 5 months ago

@xczhai @p-wysocki please correct me , if I go wrong :

Both the quantize_linear.cpp and dequantize_linear.cpp (which are in PaddlePaddle format) needs to be translated / refactored / rewritten into ONNX format.
But there is already some existing code in quantize_linear.cpp and dequantize_linear.cpp which are at src/frontends/onnx/frontend/src/op/ translates those respective functions from PaddlePaddle format to ONNX format.
To achieve the above , either new files have to be created at src/frontends/onnx/frontend/src/op or the files have to be over written ,

Is that all or , am I missing something ?I'm always open to suggestions in any form to enhance this. Your insights and ideas are highly valued! Please feel free to share your thoughts on how we can make this even better. Your input is greatly appreciated! :rocket:

xczhai commented 5 months ago

@SANTHOSH-MAMIDISETTI 1 is right. And 2&3 are wrong. ONNX implementation can be considered as gold or reference. Just need to modify the PDPD files as I described way1 or way2 in above picture comments.

SANTHOSH-MAMIDISETTI commented 5 months ago

@SANTHOSH-MAMIDISETTI 1 is right. And 2&3 are wrong. ONNX implementation can be considered as gold or reference. Just need to modify the PDPD files as I described way1 or way2 in above picture comments.

Ooh!, Thanks a lot for the clarification !! ,

p-wysocki commented 5 months ago

Hello @SANTHOSH-MAMIDISETTI, are you still working on this issue?

SANTHOSH-MAMIDISETTI commented 5 months ago

hi @p-wysocki sir , right now my end semester exams are going on , they will continue till feb 11th , so I am unable to focus fully on this issue. but after Feb 11th i'll be able to concentrate fully over it.

gaganchapa commented 4 months ago

Hello @p-wysocki and @xczhai ,

If possible, could you please provide a step-by-step list of the tasks required for refactoring the PaddlePaddle quantization implementation to align with ONNX? Specifically, I'd like to know:

Which files need to be modified?
Are there any files that should be referenced but not modified?
Are there any files that should not be touched at all during this process?

A clear list of these tasks would greatly help in understanding the scope of the work involved. Thank you!

xczhai commented 4 months ago

@gaganchapa

https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp Just modify the custom pass to ensure that the output quantized model from the pass is similar to the output quantized model from OpenVINO ONNX frontend. It is enough to reach our target.
https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/onnx/frontend/src/op/quantize_linear.cpp https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/onnx/frontend/src/op/dequantize_linear.cpp You can download an onnx quantized model and convert to openvino IR. And then download a PDPD quantized model and convert to openvino IR. make a comparison.
N/A

DaniAffCH commented 4 months ago

.take

github-actions[bot] commented 4 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

DaniAffCH commented 4 months ago

Hi @xczhai I have a couple of questions regarding this task. Where can I find onnx quantized model and PDPD quantized model? Should they refer to the same model?

Also, once I converted them into openvino IR, is there any tool for visualizing and comparing them?

xczhai commented 4 months ago

@DaniAffCH Hi,

onnx quantized model: https://github.com/onnx/models/tree/main/validated/vision/classification/resnet
PDPD quantized model: https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/en/quantize.md#paddleclasseries
For visualizing, you can use netron to open OpenVINO IR. Please let me know if you have any questions.

DaniAffCH commented 3 months ago

Screenshot from 2024-03-03 20-14-24

Hi @xczhai just to understand the task better. What are the differences I should notice? On the right side the converted ONNX and in the left the converted PDPD

xczhai commented 3 months ago

@DaniAffCH It is unclear to check the whole graph. Could you help share both files with me?

DaniAffCH commented 3 months ago

@xczhai Sure, here you can find the two converted models.

DaniAffCH commented 3 months ago

Gentle request for a followup :)

xczhai commented 3 months ago

sorry for reply late.

As we talked before, PDPD's and ONNX's quantize_linear & dequantize_linear op are translated in PDPD frontend and ONNX frontend in different way.
ONNX's dequantize_linear mapped into OpenVINO FakeQuantize IR.
PDPD's dequantize_linear mapped into serveral OpenVINO IRs without FakeQuantize.
In order to match the pattern of low precision transformation pass, I created a custom pass https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/paddle/src/internal/pass/transform_fakequantize.cpp. As a result, FakeQuantize IR is imported after this pass and then it can hit the next low precision transformation. You can refer to the comments in this cpp file.
Our goal is to implement the custom pass like ONNX's output instead of fusing too muny IRs. Thanks.

p-wysocki commented 2 months ago

Hello @DaniAffCH, are you still working on that task?

DaniAffCH commented 2 months ago

Hi @p-wysocki, no I'm no longer working on this because I'm currently focussing on the NNCF repo. I'll self-unassign me

mlukasze commented 2 months ago

Thanks for info @DaniAffCH and good luck with NNCF :)

liutianguo commented 2 months ago

WLB#+ .take

github-actions[bot] commented 2 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

p-wysocki commented 1 month ago

Hello @liutianguo, are you still working on that issue? Do you need any help?

openvinotoolkit / openvino

[Good First Issue]: Refactor PaddlePaddle Quantization Implement Schemes like ONNX #20687

Context

Same

Difference

What needs to be done?

Example Pull Requests

Resources

Contact points

Ticket

WLB#+ .take