scambier / obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
GNU General Public License v3.0
349 stars 19 forks source link

[Feature request] Enhanced text extraction for network images #55

Closed moulai closed 8 months ago

moulai commented 8 months ago

Is your feature request related to a problem? Please describe. The current limitation of the plugin is that it can only extract text from files ending in ".png/.jpg" and similar formats. This poses a challenge when trying to extract text from network images, as the image links may not necessarily end in ".png/.jpg". I believe some of the users prefer to upload images to image hosting services and then insert the links into Markdown documents.

It would be beneficial if the plugin could enhance its functionality to extract text from network images as well, by detecting ".png/.jpg" within the entire image link.

Describe the solution you'd like I kindly suggest that the plugin be updated to extract text from network images by detecting ".png/.jpg" within the entire image link, rather than solely at the end of the link.

Describe alternatives you've considered One alternative approach could involve manually downloading the image and then utilizing the plugin to extract text from the downloaded image file. However, this process can be cumbersome and less efficient compared to directly extracting text from the network image link.

Additional context An example of a network image link is: https://g1proxy.wimg.site/sOyTQBWLoZ_0WlQLXX7i2fynQZzBlcMPD_BLDiB09XsE/https://mmbiz.qpic.cn/mmbiz_png/GptLuMzjaIeOOSHapMMrWVadjGwic7icBjicpLVdVZfp1ZR9iaJiajia8ar62VyZ8JSCmIYeKSBryHRuUO0OuGZTlHIQ/640?wx_fmt=png&from=appmsg&tp=wxpic&wxfrom=5&wx_lazy=1&wx_co=1

scambier commented 8 months ago

by detecting ".png/.jpg" within the entire image link.

Your example link doesn't even have .png or .jpg in it. The only way to know if a link points to an image file is to download and read it, which is a security risk.

Sorry, won't fix: https://github.com/scambier/obsidian-text-extractor/issues/27#issuecomment-1533508793