neuml / txtmarker

Highlight text in documents
Apache License 2.0
73 stars 11 forks source link

Highlighter too slow to highlight sentences in pdf #9

Open muazhari opened 2 years ago

muazhari commented 2 years ago

I want to do highlight some sentences in pdf, but the process is too slow. The average annotation process per sentence takes 0.04 seconds. The problem is in my use case that I have to annotate thousands of sentences. For example 1000 sentences instead becomes 1000*0.04 = 40 seconds, this is too slow. How to speed up the annotation process?

davidmezzetti commented 2 years ago

Have you considered multiprocessing to achieve the desired performance? For example, if you have 4 cores with hyper-threading, you can create 8 processes to highlight.

Processing will scale close to linearly. In the example above, processing time is reduced down to 40/8 = 5 seconds. If you have 8 cores, its 2.5 seconds and so on.

muazhari commented 2 years ago

Okay, thanks for the idea. And, I think you must add it as a configurable feature or other optimization in the next version. It is good to have. But, are there any alternative optimization methods, like using gpu or something easy? I only have access to a total of 2 logical cpu cores currently.