Closed ViterAlex closed 2 years ago
Current Behavior:
Doesn't recognize technical drawings
Tesseract mainly supports OCR of text documents.
It's not an issue actually but question. I'm junior developer on .Net. I'm trying to OCR technical drawings (mechanicals). I'm looking for the tool of recognition. Is it possible to train tesseract engine on such specific symbols?
You can train Tesseract on any symbols. The ones in the sample have Unicode code points:
$ uni print U+2316 U+2300 U+23CA U+27C2
cpoint dec utf-8 html name
'⌀' U+2300 8960 e2 8c 80 ⌀ DIAMETER SIGN (Other_Symbol)
'⌖' U+2316 8982 e2 8c 96 ⌖ POSITION INDICATOR (Other_Symbol)
'⏊' U+23CA 9162 e2 8f 8a ⏊ DENTISTRY SYMBOL LIGHT UP AND HORIZONTAL (Other_Symbol)
'⟂' U+27C2 10178 e2 9f 82 ⟂ PERPENDICULAR (Math_Symbol)
I guess that there are a lot more symbols used, and some are not defined in Unicode. But you can use own code points in the PUA (Private Use Area).
You are left alone to train a model for technical drawings. Which one? CAD/CAM for machinery, buildings, electrical, plumbing, printed circuits, cartography, meteorology, military, gardening?
The standard models support languages and writing systems. That's what the majority of developers and contributors of Tesseract have expertise on.
Tesseract has problems removing the frames of table-like objects. You must remove them yourself during preprocessing.
Sorry, but it's really hard to find any information about this
It's an FAQ. You should ask such questions in the forum.
Environment
- 4.1.1:
- Commit Number:
- Windows:
Current Behavior:
Doesn't recognize technical drawings
Expected Behavior:
Recognize dimensions text and symbols on drawings
Suggested Fix:
It's not an issue actually but question. I'm junior developer on .Net. I'm trying to OCR technical drawings (mechanicals). I'm looking for the tool of recognition. Is it possible to train tesseract engine on such specific symbols?
Sorry, but it's really hard to find any information about this
Hey did you have any luck in figuring this out?
Environment
Current Behavior:
Doesn't recognize technical drawings
Expected Behavior:
Recognize dimensions text and symbols on drawings
Suggested Fix:
It's not an issue actually but question. I'm junior developer on .Net. I'm trying to OCR technical drawings (mechanicals). I'm looking for the tool of recognition. Is it possible to train tesseract engine on such specific symbols?
Sorry, but it's really hard to find any information about this