internetarchive / Zeno

State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
70 stars 8 forks source link

Extract URLs from images #86

Open CorentinB opened 1 month ago

CorentinB commented 1 month ago

Would be interesting to try to do OCR on images (as an option) to extract URLs from watermark and such.

yzqzss commented 1 month ago

OCR might be slow and inaccurate, but how about extracting URLs from QR codes in images?

CorentinB commented 1 month ago

OCR might be slow and inaccurate, but how about extracting URLs from QR codes in images?

Very good idea. (not a priority though, maybe it should be another issue?)