Closed agkozak closed 1 year ago
That's probably a PDF (or an image, but that's less likely) that hangs the process. You could try removing your PDFs from your vault, check if it works, and adding them back one by one until it freezes. It obviously shouldn't happen, so if you find the "bad" file and if you can share it with me, I'll gladly take it for my tests :)
At startup, the console log should also display a "Text Extractor - Number of workers: x". Could you tell me how many workers you have?
At startup, the console log should also display a "Text Extractor - Number of workers: x". Could you tell me how many workers you have?
11 workers. With the console open, when the screen goes dark, I get the error message "DevTools was disconnected from the page. Once page is reloaded, DevTools will automatically reconnect."
In a little while, I'll try removing my PDFs one by one and see if I can find the culprit.
Interesting. I tried renaming each .pdf
to .pdf.tmp
, but that didn't resolve the problem. Finally I moved a directory with 19 JPGs (that have a lot of text in them) out of the vault. The problem went away entirely. Now that I've moved them back, I'm not seeing the problem recur. I'll leave Text Extractor on today, and I'll let you know what happens.
I've had this behavior happen in the past without being able to pinpoint the exact cause, but I think it boiled down to "too many workers". Text Extractor spawns a number of workers that is ~= to 70% of your cpu cores (so 11 workers for 16 cores). Maybe 11 with a heavy load is just too much, I'll cap this value at 8 for the next releases.
Thanks for your feedback!
Thanks again.
When I installed the update v0.4.2, my screen went back to turning black. I emptied the directory full of text-laden images that seem to be the source of the problem and then put them back slowly, one by one. That fixed the problem -- probably temporarily.
I'll be happy to supply you with the images in question, if you think it might help you to troubleshoot the problem.
Yes please, you can send me a zip with the images and I'll take a look :) Thank you
Yes please, you can send me a zip with the images and I'll take a look :) Thank you
https://annelenner.com/Terry_Brown.zip
(80 MB)
Good news (I guess), it crashes on my machine too :)
Good news (I guess), it crashes on my machine too :)
Nice pictures though, huh? :)
Again, thanks for looking into this matter. It will be interesting to see which of the photos is the offender.
Ok so the issue was indeed "too many workers" + "too much RAM used by worker". I got it running with 4 workers, but it literally ate all my 24GB of RAM. No real offender from your images, it's just Text Extractor biting of more than it could chew.
I released an update to have max 2 OCR workers, it should solve the issue :)
That seems to work quite nicely. Thanks!
Problem description:
When I enable Text Extract on Obsidian on Windows 11, about 20 seconds after I start Obsidian, the screen goes black:
The program continues not to function indefinitely. Pressing
Ctrl+Shift+I
does nothing. Only opening Obsidian and quickly disabling Text Extractor and restarting Obsidian fixes things.I have never been able to get Text Extractor to work, but clearly it must work for everyone else. What diagnostic technique is most likely to reveal my problem? Thanks!
Your environment: