mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.66k stars 426 forks source link

[FIX] parseq onnx export #1585

Closed felixdittrich92 closed 4 months ago

felixdittrich92 commented 4 months ago

This PR:

CC @llFireHawkll

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 96.32%. Comparing base (2940d9d) to head (03cb48a). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1585 +/- ## ========================================== - Coverage 96.35% 96.32% -0.03% ========================================== Files 163 163 Lines 7701 7702 +1 ========================================== - Hits 7420 7419 -1 - Misses 281 283 +2 ``` | [Flag](https://app.codecov.io/gh/mindee/doctr/pull/1585/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=mindee) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/mindee/doctr/pull/1585/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=mindee) | `96.32% <100.00%> (-0.03%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=mindee#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

felixdittrich92 commented 4 months ago

@odulcy-mindee additional also to share with you:

---------------------- Recognition ----------------------

Time for base model: 0.01111
Time for onnx model: 0.00365
crnn_vgg16_bn --> mean diff: 6.4257656049449e-05

Time for base model: 0.00968
Time for onnx model: 0.00156
crnn_mobilenet_v3_small --> mean diff: 1.5333176634158008e-05

Time for base model: 0.00854
Time for onnx model: 0.00222
crnn_mobilenet_v3_large --> mean diff: 1.0634032150846906e-05

Time for base model: 0.07205
Time for onnx model: 0.02604
sar_resnet31 --> mean diff: 0.011326710693538189

Time for base model: 0.35959
Time for onnx model: 0.1154
master --> mean diff: 7.826339242456015e-06

Time for base model: 0.02087
Time for onnx model: 0.00979
vitstr_small --> mean diff: 2.186470737797208e-06

Time for base model: 0.0599
Time for onnx model: 0.02341
vitstr_base --> mean diff: 5.688822966476437e-06

Time for base model: 0.04333
Time for onnx model: 0.02047
parseq --> mean diff: 1.8520279354561353e-06

---------------------- Detection ----------------------

Time for base model: 0.56117
Time for onnx model: 0.30894
fast_base --> mean diff: 0.001967190532013774

Time for base model: 0.48526
Time for onnx model: 0.24468
fast_small --> mean diff: 4.495668888092041

Time for base model: 0.42203
Time for onnx model: 0.22532
fast_tiny --> mean diff: 0.08134301006793976

Time for base model: 0.19716
Time for onnx model: 0.11956
rep_fast_tiny --> mean diff: 0.09087851643562317

Time for base model: 0.21275
Time for onnx model: 0.13132
rep_fast_small --> mean diff: 2.3484139442443848

Time for base model: 0.25621
Time for onnx model: 0.1563
rep_fast_base --> mean diff: 0.0023352387361228466

Time for base model: 0.43838
Time for onnx model: 0.19122
db_resnet50 --> mean diff: 0.0009908436331897974

Time for base model: 0.19789
Time for onnx model: 0.08287
db_mobilenet_v3_large --> mean diff: 0.08591402322053909

Time for base model: 0.15283
Time for onnx model: 0.08877
linknet_resnet18 --> mean diff: 0.00012636046449188143

Time for base model: 0.21763
Time for onnx model: 0.14226
linknet_resnet34 --> mean diff: 3.3567819627933204e-05

Time for base model: 0.44412
Time for onnx model: 0.24185
linknet_resnet50 --> mean diff: 6.365984154399484e-05
odulcy-mindee commented 4 months ago

additional also to share with you

@felixdittrich92 it's the inference time ? :open_mouth:

felixdittrich92 commented 4 months ago

additional also to share with you

@felixdittrich92 it's the inference time ? 😮

@odulcy-mindee correct :)

felixdittrich92 commented 4 months ago

additional also to share with you

@felixdittrich92 it's the inference time ? 😮

@odulcy-mindee correct :)

drop TF and provide a Onnx pipeline 😂

(I have started today in a seperate repo to integrate a onnx pipeline for doctr if this works we can discuss possible ways to integrate or keep it seperate)

decadance-dance commented 4 months ago

@odulcy-mindee additional also to share with you:

---------------------- Recognition ----------------------

Time for base model: 0.01111
Time for onnx model: 0.00365
crnn_vgg16_bn --> mean diff: 6.4257656049449e-05

Time for base model: 0.00968
Time for onnx model: 0.00156
crnn_mobilenet_v3_small --> mean diff: 1.5333176634158008e-05

Time for base model: 0.00854
Time for onnx model: 0.00222
crnn_mobilenet_v3_large --> mean diff: 1.0634032150846906e-05

Time for base model: 0.07205
Time for onnx model: 0.02604
sar_resnet31 --> mean diff: 0.011326710693538189

Time for base model: 0.35959
Time for onnx model: 0.1154
master --> mean diff: 7.826339242456015e-06

Time for base model: 0.02087
Time for onnx model: 0.00979
vitstr_small --> mean diff: 2.186470737797208e-06

Time for base model: 0.0599
Time for onnx model: 0.02341
vitstr_base --> mean diff: 5.688822966476437e-06

Time for base model: 0.04333
Time for onnx model: 0.02047
parseq --> mean diff: 1.8520279354561353e-06

---------------------- Detection ----------------------

Time for base model: 0.56117
Time for onnx model: 0.30894
fast_base --> mean diff: 0.001967190532013774

Time for base model: 0.48526
Time for onnx model: 0.24468
fast_small --> mean diff: 4.495668888092041

Time for base model: 0.42203
Time for onnx model: 0.22532
fast_tiny --> mean diff: 0.08134301006793976

Time for base model: 0.19716
Time for onnx model: 0.11956
rep_fast_tiny --> mean diff: 0.09087851643562317

Time for base model: 0.21275
Time for onnx model: 0.13132
rep_fast_small --> mean diff: 2.3484139442443848

Time for base model: 0.25621
Time for onnx model: 0.1563
rep_fast_base --> mean diff: 0.0023352387361228466

Time for base model: 0.43838
Time for onnx model: 0.19122
db_resnet50 --> mean diff: 0.0009908436331897974

Time for base model: 0.19789
Time for onnx model: 0.08287
db_mobilenet_v3_large --> mean diff: 0.08591402322053909

Time for base model: 0.15283
Time for onnx model: 0.08877
linknet_resnet18 --> mean diff: 0.00012636046449188143

Time for base model: 0.21763
Time for onnx model: 0.14226
linknet_resnet34 --> mean diff: 3.3567819627933204e-05

Time for base model: 0.44412
Time for onnx model: 0.24185
linknet_resnet50 --> mean diff: 6.365984154399484e-05

Hi @felixdittrich92, time for onnx models measured using "CUDA Execution Provider" or "CPU Execution Provider"?

felixT2K commented 4 months ago

@decadance-dance providers=["CPUExecutionProvider"] CPU (i7-14700K) :)

passed only 1 sample so it was only a short test and it's only up to the logits so the time for postproc would come on top

decadance-dance commented 4 months ago

@felixT2K, so as I understood, for example when testing text recognition models you measured time for 1 crop?

felixdittrich92 commented 4 months ago

@felixT2K, so as I understood, for example when testing text recognition models you measured time for 1 crop?

correct as mentioned was only a short test not really comparable with a benchmark ^^

decadance-dance commented 4 months ago

@felixdittrich92, got it, thanks. I'm curious about this because I just noticed that if I test on large batches, it becomes too long. For example, my input shape (1000, 3, 32, 128), which is equal to amount of crops from a large document, takes about 8.6 seconds and I don’t know how to deal with it yet, since I run my services on the CPU and need faster inference.

felixdittrich92 commented 4 months ago

@felixdittrich92, got it, thanks. I'm curious about this because I just noticed that if I test on large batches, it becomes too long. For example, my input shape (1000, 3, 32, 128), which is equal to amount of crops from a large document, takes about 8.6 seconds and I don’t know how to deal with it yet, since I run my services on the CPU and need faster inference.

Have you tried to modify the det_bs and reco_bs ? This should bring a small speed up from my experience det_bs=4 and reco_bs=1024 works well compared to speed.

I started yesterday to work on a doctr wrapper with a pure Onnxruntime pipeline (for the moment in a private seperated repo but i will open it as soon as it can be used)

decadance-dance commented 4 months ago

@felixdittrich92 when I build the parseq model with dummy_input (1024, 3, 32, 128) I get slowdown compared to (1, 3, 32, 128). Code I use:

import torch
from doctr.models import parseq
from doctr.models.utils import export_model_to_onnx

model = parseq(pretrained=True, exportable=True)
dummy_input = torch.rand((1024, 3, 32, 128), dtype=torch.float32)
model_path = export_model_to_onnx(model, model_name="parseq1024", dummy_input=dummy_input)
felixdittrich92 commented 4 months ago

@felixdittrich92 when I build the parseq model with dummy_input (1024, 3, 32, 128) I get slowdown compared to (1, 3, 32, 128). Code I use:

import torch
from doctr.models import parseq
from doctr.models.utils import export_model_to_onnx

model = parseq(pretrained=True, exportable=True)
dummy_input = torch.rand((1024, 3, 32, 128), dtype=torch.float32)
model_path = export_model_to_onnx(model, model_name="parseq1024", dummy_input=dummy_input)

I mean if you use the ocr_predictor :)

For example:

ocr_predictor("db_resnet50", "parseq", det_bs=4, reco_bs=1024)
decadance-dance commented 4 months ago

@felixdittrich92, got you, but I do not use 'ocr_predictor' at all. I have a complex end2end pipeline so I used recognition_predictor and detection_predictor separately. But now I can't use even them because we trying to migrate on CPU. That's why I am interested in ONNX.