openpaperwork / paperwork

Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/paperwork
2.43k stars 149 forks source link

Paperwork limitation: Big documents (> 100 pages) #782

Open kafran opened 6 years ago

kafran commented 6 years ago

Guys, what are the limitation of Paperwork? Right now I have 4 documents with approximately 200 pages each and Paperwork is uselessly slow. I just can't use the software. When I add each doc my computer went to its knees. I have a Intel i5 with 8GB ram, if I add a 200 page scanned doc to Paperwork my ram goes to 7.33GB Used and more then 8GB Swap. After processing all the docs, It's impossible to search and use paperwork, it takes too long to process search and load pages. Maybe because I'm running it through flatpak?

I really liked the idea behind paperwork, I usually scan docs to .tiff; I liked the idea of having the docs organized as images with OCR and to convert it to PDF with configurable quality as necessary. Does anybody knows another app which could help me on this task until Paperwork solves its optimizations problems? I don't care for all the fancy animations, etc. I just need a software to tag and manage scanned documents and export to PDF when needed.

tiramiseb commented 6 years ago

Hello,

I currently have 1486 documents, the larger one with 111 pages, but mostly 1 or 2 pages... While not being hyper-quick, paperwork is fast enough to be usable...

tYYGH commented 6 years ago

I can confirm that Paperwork is not optimized for big documents. My stats are such: — 3 documents with >100 pages (max 158 pages) — 63 documents with 16–99 pages (evenly distributed along the range) — 102 documents with 7–15 pages (evenly distributed along the range) — 118 documents with 5–6 pages — 2437 documents with 1–4 pages (decreasing, starting with half of these having only 1 page)

With these stats, Paperwork remains usable. But whenever I open one of the 3 big documents, I get a temporary freeze…

tiramiseb commented 6 years ago

I think it is clear that paperwork is not meant for big documents...

jflesch commented 6 years ago

Full disclosure: The biggest document I've used to test Paperwork is about ~100 pages. And it is a test document, not one that I really use day-to-day :/

jflesch commented 6 years ago

@kafran : By the way, did you import those big documents as PDF, or did you scan them ?

Regarding your question about other applications, unfortunately, I don't know any that is opensource and does exactly what Paperwork does (I wouldn't be working on it otherwise ;).

However, you may want to have a look at some web applications doing similar things. For instances:

kafran commented 6 years ago

@jflesch thank you for paperwork, its a great piece of software. I posted this on the hope someone could help me how to get things running more smoothly. Or for another solution with a faster search and visualization capability. All documents I'm scanning I often need to retrieve information on it.

All documents I'm putting on paperwork I scanned myself with a Kodak ScanMate i1150. First I scan using this script https://gist.github.com/kafran/46b1d798cef7b3aa48e9a138f99902cf because the scanner I'm using is capable of detect and exclude blank pages and then I import it to paperwork with the "Import image folder" option.

I don't know how paperwork could get more resource efficient. If I have not 8GB ram and 16GB Swap partition it wouldn't be impossible for me to use Paperwork.

jflesch commented 6 years ago

Once I'm done with libinsane, I'll work on rewriting / rearranging Paperwork. The main goal will be to get the code more modular, but my hope is that it will help testing and help isolate and fix issues (bugs but optimization issues as well).

While I can't do much for you right now regarding Paperwork, I would appreciate it if you could submit a test scan report for the scanner database on openpaper.work : https://openpaper.work/en/scanner_db/#contribute . I have no Kodak scanner currently in the database. I would be curious to see what other options it can provide.

kafran commented 6 years ago

Sure. I would be glade to do that.