hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
https://hoarder.app
GNU Affero General Public License v3.0
6.48k stars 235 forks source link

can not fetch the picture: #593

Closed leftchest closed 3 weeks ago

leftchest commented 3 weeks ago

Describe the Bug

can not fetch the picture from the flowing pages:

1、https://linux.do/t/topic/101732

2、https://mp.weixin.qq.com/s?__biz=Mzk0MzYyMzExMQ%3D%3D&ascene=3&chksm=c25936b85404103d5d572eb3d3cd5ffb181f18cc418ef3cc6bb18ccc96514e0978e1e5ee94a3&clicktime=1730084477&countrycode=EE&devicetype=android-31&enterid=1730084477&exportkey=n_ChQIAhIQLKTcD7%2FreQqJrj0dvVLXFBLxAQIE97dBBAEAAAAAANNBK5KxDGAAAAAOpnltbLcz9gKNyK89dVj0wVLFxs5eWGDi4sJnwm5c0RdqnLfJDCU16QjL%2BYmonML99lRqdFMx%2FmqUSoDqUc4tGipPURo7XZNTk%2Bo%2FHooSvPHW%2FMB0UmLxYgZMe0vgvTK8tC6d7%2FJ3QxgpX7vb6%2BVYHjKE4RFt85Jv%2Bd1Ki%2FSYRxuFVoNCUs8mIkZlhUh9cxO5XZtTpHDvz67MvyzF4kzs0fDswXdWa0EHhvX7wYDzJU%2BaTa1QpPpPg8fs%2B9c5e2Jj8hxn%2FRGTX18g8BQ7I%2BLfGdOxcn5e%2BxNwO7U%3D&fasttmpl_flag=0&fasttmpl_fullversion=7442750-zh_CN-zip&fasttmpl_type=0&idx=1&lang=zh_CN&mid=2247484663&nettype=WIFI&pass_ticket=o6j9dJe02FB7QCLqGVRJdUEOkI0iFNcqpKAEyicLOGUAGzp2QKhBcneiCaP4gjzO&realreporttime=1730084477950&scene=126&session_us=gh_449a85299e74&sessionid=1730082275&sn=20947a8c81cac1fce388b562bd79a8b0&subscene=10000&version=28003339&wx_header=3

thanks and expect

Steps to Reproduce

can not fetch the picture:

Expected Behaviour

can not fetch the picture:

Screenshots or Additional Context

No response

Device Details

linux

Exact Hoarder Version

0.18

kamtschatka commented 3 weeks ago

https://linux.do/t/topic/101732 is behind cloudflare protection. This is specifically designed to prevent crawlers from crawling the content --> we will not be able to crawl that image, without adding some special handling and starting a back and forth between cloudflare and us to circumvent the protection. We are not going to do that. There are projects out there trying to do that, but they are also not very successful.

For your second link: works fine for me: image

Please provide the logs when you are crawling this page, so we know what is happening for you.