siyuan-note / siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
https://b3log.org/siyuan
GNU Affero General Public License v3.0
22.31k stars 1.58k forks source link

OCR api 返回文字坐标信息 #11584

Closed 2234839 closed 4 months ago

2234839 commented 5 months ago

In what scenarios do you need this feature?

扩展现有ocr能力

Describe the optimal solution

可参考 百度ocr api 文档 的返回结构

Describe the candidate solution

No response

Other information

No response

2234839 commented 5 months ago

图片

@88250 如果可以返回定位信息的话,将这个功能内置到思源里,我感觉也可以作为一大卖点

Achuan-2 commented 4 months ago

支持!

Achuan-2 commented 4 months ago

以及建议更换OCR为paddle OCR,见:https://github.com/siyuan-note/siyuan/issues/10232

TCOTC commented 4 months ago
88250 commented 4 months ago

我搜了下 TesseractOCR,好像可以支持,有空的话请帮忙 PR,谢谢。

2234839 commented 4 months ago

换不换ocr引擎

88250 commented 4 months ago

不换的,换引擎目前还要考虑跨平台问题。

Achuan-2 commented 4 months ago

建议直接换引擎,paddle OCR识别又快又准 我最近几个扫描的pdf,都是用 https://github.com/hiroi-sora/Umi-OCR ,进行OCR的,效果很满意,调用的就是离线的paddle ocr https://github.com/hiroi-sora/PaddleOCR-json

88250 commented 4 months ago

@Achuan-2 没有 macOS/Linux 编译包,恐怕换不了。

Achuan-2 commented 4 months ago

有人在做了,不知道这个能不能用:https://github.com/Gavin1937/PaddleOCR-json/releases/tag/v1.4.0 讨论:https://github.com/hiroi-sora/PaddleOCR-json/issues/47

2234839 commented 4 months ago

其实支持 PaddleOCR 我认为也可以,但是不能以嵌入的方式,只能说如果环境变量中能够使用就用