Open vikasr111 opened 2 days ago
Hi @vikasr111 :wave:,
Thanks for reporting :+1:
It's already planned to retrain all detection models with our new augmentation pipeline and an extended dataset for pretraining to make them more robust.
Could you please give "db_mobilenet_v3_large"
as detection arch a try (this model is already pretrained with our new augmentation pipeline).
Additional you can tweak a bit around with the bin_thresh
and box_thresh
values (lower score -> more detected / less accure | higher score -> possible less detected / more accure)
https://mindee.github.io/doctr/using_doctr/using_models.html#advanced-options
predictor = ocr_predictor(
det_arch="db_mobilenet_v3_large",
reco_arch="parseq",
pretrained=True,
preserve_aspect_ratio=False,
symmetric_pad=False,
)
predictor.det_predictor.model.postprocessor.bin_thresh = 0.35
predictor.det_predictor.model.postprocessor.box_thresh = 0.3
result = predictor(doc)
result.show()
CC @odulcy-mindee A good sign that the new augmentation pipeline improves our models ^^ Nevertheless, I think we need to expand the dataset a bit.
Thanks for the reply. db_mobilenet_v3_large
does work better. When the new model pipelines will be available, any iea?
I have another follow up question. Is there any example on how can I plot the OCR line output on a canvas as per their geometry and eventually generate text output where texts are arranged using spaces and new lines to to maintain the layout of original document.
Here's a sample output:
========== Page 1 of 1 ==========
Page 1 of 1
PURCHASE ORDER
PURCHASE ORDER
SENSIENT 3157276
Ship To:
Vendor: Bill To: Supplier:
Company FJvbboinbio SINEDOIBENT COLORS LLC Sbsbwwb Colors LLC JBSUVWVE
US LLC 1659 SAUGET BUSINESS 2515 North Jefferson Avenue 1421 WILLIS
DEPT 771807 BLVD STE A St. Louis MO 63106 SYRACUSE NY 13204
P O BOX 77000 SAUGET TI 62206 314-658-7318
DETROIT 314-286-7172
IW Attn: Accounts Payable
48277-1807 APSTLColor @iuuivv.com
Tax Exempt
Vendor No# Order Date Ship Via Freight Terms FOB Payment Terms
No#
82-3618676
10153300 2024-11-05 020Net
Sensient/Supplier PR Extended
Item No# Product Description Tax Order Qty UM Unit Price UM Amount Date Due
19958.0000 KG 90,639.71 2025-01-24
717701 SODIUM NITRITE FCC SPEC GRAN N 2.0600 LB
SN FREE-FLOW SODIUM NITRITE FCC SPEC GRAN
FOOD GRADE 2000LB SACK
Total Amount 90,639.71
*PLEASE CONFIRM BY REPLYING WITH CORRECT PRICING & DELIVERY TO : Colors.PurchasingSTL@ Sensient.com *
IMPORTANT THIS PURCHASE ORDER NO. MUST APPEAR ON ALL INVOICES BILL OF LADING. PACKING SLIP. AND PACKAGES ALL INVOICES MUST
DUPLICATE OT ACCOUNTS PAYABLE AT THE ADDRESS LISTED. BUYER: JEFFY SULLIVAN
NO DELIVERIES ACCEPTED AFTER 3:00 PM (MON-FRI)
PHONE:
FAX:
E-MAIL jeftsulivan@example.com
SEE REVERSE SIDE FOR TERMS & CONDITIONS
All dates in YYYY-MM-DD format.
CC @odulcy-mindee @vikasr111 (@odulcy-mindee correct me if that's not realistic) I think we can start in january to retrain / test with our already updated augmentation pipeline / retraining with an extended dataset will take a bit more time.
About the sec part have you already tried:
import matplotlib.pyplot as plt
result = predictor(doc)
synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
You can additional try passing resolve_blocks=True
to the ocr_predictor
, it's currently disabled by default because there are mostly endless possible document layouts where the algorithm fails to often. :)
Bug description
I am trying to use DocTR for a document which as texts arranged in two columns and has dense texts. I noticed that the text detection is incorrect. It identified multiple overlapping text blocks because of which the text output is also incorrect.
Here's the original document:
Here's the OCR plot:
Here's the segmentation result:
How to address it?
Code snippet to reproduce the bug
Error traceback
No error but the output is incorrect
Environment
python 3.10
Deep Learning backend
Torch