Closed bonswouar closed 11 months ago
@unixfox thanks for the link it's interesting! Although it seems my issue isn't exactly the same, I can nslookup/ping archive.is successfully (using Hetzner's dns server apparently hm) But they probably just have different types of IP restrictions..
Sadly archive.is is blocked by a CAPTCHA and I don't have a clue how we can avoid this CAPTCHA
@bonswouar I'm sorry, but we do not have a solution for this issue at hand .. in #2645 I will drop the XPath configuration for this engine (makes no sense to hold a configuration, that do no longer work).
The merge of #2645 will close this issue .. if this search engine is very important for you, you would have to open an engine request ... maybe there is someone who can implement a python module which is able to bypass the CAPTCHA problem (if there is a way to bypass). Maybe you can already make suggestions, any support is welcome.
I am sorry that we can not do more at present ..
@return42 No worries I totally understand! And unfortunately after the few tests I did on my side it seems you're right, there is no easy way to bypass this captcha (it seems to depend too much of the IP)
But who knows, maybe at some point they'll revert some of those restrictions when they see how problematic it can be (I actually can't use the website at all form my personnal browser & IP)
Version of SearXNG, commit number if you are using on master branch and stipulate if you forked SearXNG Repository: https://github.com/searxng/searxng Branch: master Version: 2023.8.8+bcaaae699
How did you install SearXNG? searxng-docker
What happened? Tried to use archive.is engine, but it always timeout
How To Reproduce Search for "!ai samsung.com"
Expected behavior Shouldn't timeout
Additional context My Searxnng instance is on a dedicated server. But I notice I also struggle navigating to archive.is directly : with my home connection (using a shared Starlink IP) it seems to infinite loop on the captcha page. But no problem using cell network.. So I guess they might have weird IP restrictions or something?
Technical report
Error
(None, None, 'archive.is')
searx/search/processors/online.py:118
_send_http_request
response = req(params['url'], **request_args)
Error
(None, None, None)
searx/search/processors/online.py:118
_send_http_request
response = req(params['url'], **request_args)