scambier / obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
GNU General Public License v3.0
334 stars 16 forks source link

[BUG] Text Extractor causes a black screen on Windows #19

Closed agkozak closed 1 year ago

agkozak commented 1 year ago

Problem description:

When I enable Text Extract on Obsidian on Windows 11, about 20 seconds after I start Obsidian, the screen goes black:

image

The program continues not to function indefinitely. Pressing Ctrl+Shift+I does nothing. Only opening Obsidian and quickly disabling Text Extractor and restarting Obsidian fixes things.

I have never been able to get Text Extractor to work, but clearly it must work for everyone else. What diagnostic technique is most likely to reveal my problem? Thanks!

Your environment:

scambier commented 1 year ago

That's probably a PDF (or an image, but that's less likely) that hangs the process. You could try removing your PDFs from your vault, check if it works, and adding them back one by one until it freezes. It obviously shouldn't happen, so if you find the "bad" file and if you can share it with me, I'll gladly take it for my tests :)

At startup, the console log should also display a "Text Extractor - Number of workers: x". Could you tell me how many workers you have?

agkozak commented 1 year ago

At startup, the console log should also display a "Text Extractor - Number of workers: x". Could you tell me how many workers you have?

11 workers. With the console open, when the screen goes dark, I get the error message "DevTools was disconnected from the page. Once page is reloaded, DevTools will automatically reconnect."

In a little while, I'll try removing my PDFs one by one and see if I can find the culprit.

agkozak commented 1 year ago

Interesting. I tried renaming each .pdf to .pdf.tmp, but that didn't resolve the problem. Finally I moved a directory with 19 JPGs (that have a lot of text in them) out of the vault. The problem went away entirely. Now that I've moved them back, I'm not seeing the problem recur. I'll leave Text Extractor on today, and I'll let you know what happens.

scambier commented 1 year ago

I've had this behavior happen in the past without being able to pinpoint the exact cause, but I think it boiled down to "too many workers". Text Extractor spawns a number of workers that is ~= to 70% of your cpu cores (so 11 workers for 16 cores). Maybe 11 with a heavy load is just too much, I'll cap this value at 8 for the next releases.

Thanks for your feedback!

agkozak commented 1 year ago

Thanks again.

agkozak commented 1 year ago

When I installed the update v0.4.2, my screen went back to turning black. I emptied the directory full of text-laden images that seem to be the source of the problem and then put them back slowly, one by one. That fixed the problem -- probably temporarily.

I'll be happy to supply you with the images in question, if you think it might help you to troubleshoot the problem.

scambier commented 1 year ago

Yes please, you can send me a zip with the images and I'll take a look :) Thank you

agkozak commented 1 year ago

Yes please, you can send me a zip with the images and I'll take a look :) Thank you

https://annelenner.com/Terry_Brown.zip

(80 MB)

scambier commented 1 year ago

Good news (I guess), it crashes on my machine too :)

agkozak commented 1 year ago

Good news (I guess), it crashes on my machine too :)

Nice pictures though, huh? :)

Again, thanks for looking into this matter. It will be interesting to see which of the photos is the offender.

scambier commented 1 year ago

Ok so the issue was indeed "too many workers" + "too much RAM used by worker". I got it running with 4 workers, but it literally ate all my 24GB of RAM. No real offender from your images, it's just Text Extractor biting of more than it could chew.

I released an update to have max 2 OCR workers, it should solve the issue :)

agkozak commented 1 year ago

That seems to work quite nicely. Thanks!