After configuring siyuan's OCR, I felt that the recognition rate was low. Later, then I switched to software that invoke the paddleOCR API and found that both English and Chinese had better recognition rates. I hope siyuan can replace the original OCR engine.
Describe the optimal solution
PaddleOCR has better text recognition capabilities than Tesseract.
Quote:
Recently PaddleOCR updated the v3 version, and the English space problem has been significantly improved. I tried the English model, it works very well.
In document scenarios, PaddleOCR can achieve 95%+ accuracy. But Tesseract may be confused on some rhythmic characters.
In particular, PaddleOCR's performance in some non-Latin languages is beyond my imagination. For example Arabic, the effect is far better than EasyOCR and Tesseract
Highly recommend PaddleOCR!!!
Paddle OCR is a deep learning-based OCR system created by PaddlePaddle, a Chinese AI firm. Paddle OCR is built on the PaddlePaddle framework, which is well-known for its quick and efficient deep learning algorithms. Paddle OCR supports numerous languages, including Chinese, English, Japanese, and Korean, and can properly detect different text styles and fonts.
Advantages: High accuracy: Paddle OCR has achieved state-of-the-art performance on various OCR benchmarks, including the ICDAR 2015 and ICDAR 2017 competitions.Fast and efficient: Paddle OCR is optimized for speed and can process large volumes of images in real-time, making it suitable for applications that require high throughput.Easy to use: Paddle OCR has a user-friendly interface that allows users to quickly train and deploy OCR models.
In what scenarios do you need this feature?
After configuring siyuan's OCR, I felt that the recognition rate was low. Later, then I switched to software that invoke the paddleOCR API and found that both English and Chinese had better recognition rates. I hope siyuan can replace the original OCR engine.
Describe the optimal solution
PaddleOCR has better text recognition capabilities than Tesseract.
Quote:
In document scenarios, PaddleOCR can achieve 95%+ accuracy. But Tesseract may be confused on some rhythmic characters.
In particular, PaddleOCR's performance in some non-Latin languages is beyond my imagination. For example Arabic, the effect is far better than EasyOCR and Tesseract
Highly recommend PaddleOCR!!!
Reference:
Describe the candidate solution
pls
Other information
.