Open ccchan234 opened 1 year ago
i have to say TE is very accurate for me, with screenshots taken for pastest MCQ questions.
thanks
I also find a bit confusing how to use this plugin. I was expecting some command to scan all the images and generate cache from them,or as this issue states, a whole folder. Is this even possible?
Text Extractor was first and foremost built as a sort of "plugin's plugin". The idea was to provide a few basic helper functions for developers to build or expand their own plugin on top of it. Though to my knowledge, it's not used by anything else than Omnisearch.
I was expecting some command to scan all the images and generate cache from them
What is your use case?
My usecase is to make all the text on my images available for search with omnisearch. I want to execute them all so I can leverage the cache on mobile
El jue, 4 ene 2024, 13:06, Simon Cambier @.***> escribió:
Text Extractor was first and foremost built as a sort of "plugin's plugin". The idea was to provide a few basic helper functions for developers to build or expand their own plugin on top of it. Though to my knowledge, it's not used by anything else than Omnisearch.
I was expecting some command to scan all the images and generate cache from them
What is your use case?
— Reply to this email directly, view it on GitHub https://github.com/scambier/obsidian-text-extractor/issues/14#issuecomment-1876989484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARKJWP2EUCOLEUTHTKX7DDYM2LLZAVCNFSM6AAAAAAUOUZOKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWHE4DSNBYGQ . You are receiving this because you commented.Message ID: @.***>
Ok so you just need to enable images and pdf indexing in Omnisearch settings on a desktop PC. Omnisearch will ask Text Extractor to get the text for all those files, and that will generate the cache 👍
Ok, thanks. I think I have that enabled, but I will double check
El jue, 4 ene 2024, 18:25, Simon Cambier @.***> escribió:
Ok so you just need to enable images and pdf indexing in Omnisearch settings. Omnisearch will ask Text Extractor to get the text for all those files, and that will generate the cache 👍
— Reply to this email directly, view it on GitHub https://github.com/scambier/obsidian-text-extractor/issues/14#issuecomment-1877488460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARKJWLTSMJZ4OVZHMLRQ4TYM3QXBAVCNFSM6AAAAAAUOUZOKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGQ4DQNBWGA . You are receiving this because you commented.Message ID: @.***>
Ok so you just need to enable images and pdf indexing in Omnisearch settings on a desktop PC. Omnisearch will ask Text Extractor to get the text for all those files, and that will generate the cache 👍
I'm not sure if I have missed anything but I can't seem to get this to work with images either. PDF content seems to have been indexed, but with images I have to manually right-click and extract text to clipboard for each image to show up in search.
I had a look at the logs and there were a lot of Text Extractor - OCR Worker timeout
_imagename eval @ plugin:text-extractor:5068
messages... I'm on an ARM macOS laptop, perhaps there's some conflict stemming from that?
Perhaps a workaround could be a buttton in the settings to ignore timeouts and have it index all the images automatically? Even if it does takes hours, as long as there's a way to keep an eye on the progress, I wouldn't mind.
@paulpall
Perhaps a workaround could be a buttton in the settings to ignore timeouts and have it index all the images automatically? Even if it does takes hours
That's what is happening already, when Omnisearch uses Text Extractor, as long as this is enabled.
But if you have many images that cause a timeout (maybe they're particularly large or too complex for the OCR library), the worker is effectively blocked 120 seconds on a single image, and then blocked again on the next image, etc.
Eventually it will go through all of them though, as images are only treated once, even when they timeout.
Is your feature request related to a problem? Please describe.
I got tons of files, now TE need to be done one file by one file.
Describe the solution you'd like
select several files, Rt click, choose extract to separate files, then extracted to separate files. (may be some people also want extract ALl to 1 single file but please add filename into the 1 single documents thx)
Describe alternatives you've considered
in the form of command
Additional context