openpaperwork / paperwork

Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/paperwork
2.43k stars 149 forks source link

tagged pdf #244

Open briner opened 10 years ago

briner commented 10 years ago

After looking at the video "tagged pdf", it seems to me that such option should be investigated. more info:

Keep going on

jflesch commented 7 years ago

Note that the policy in Paperwork is to never modify the imported PDF (it could mess signatures or any other weird-and-not-always-supported extension). However, I guess it could make sense to include the tags in the exported PDFs. (Export are currently done with cairo.PDFSurface).

jflesch commented 7 years ago

Ok, I think I misunderstood this ticket. The video you indicated talks about semantic tagging (like HTML), not lags/labels like I thought.

Tagging based on OCR output is a much harder problem.