yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.38k stars 427 forks source link

I can´t crawl "http:" sites #324

Closed yunseong0986 closed 4 years ago

yunseong0986 commented 4 years ago

Hello, I try crawl sites without ssl certificate, with version 1.922. I miss some configuration?

I had test with next sites:

http://www.bnm.unam.mx/ http://www.bne.es/es/Inicio/index.html https://www.eluniversal.com.mx/ https://www.ipn.mx/

WhatsApp Image 2020-01-15 at 9 55 32 AM

ipforums commented 4 years ago

If you have both IPv4 and IPv6 enabled, Turn off IPv6, restart yacy and see if things change... Test one URL at a time.

yunseong0986 commented 4 years ago

If you have both IPv4 and IPv6 enabled, Turn off IPv6, restart yacy and see if things change... Test one URL at a time.

Disabled IPv6 in Control Panel of Windows and nothing change, same error "scraper cannot load URL: Client can't execute: Address family not supported by protocol family: connect duration=0 for url http://www.bnm.unam.mx/"

ipsoftdev commented 4 years ago

Very strange. I assume you are able to hit these failing URLs in the browser. The screenshot you initially posted cuts off, but It looks like the connection might be timing out for some reason. Try turning off antivirus / firewall if you have it installed. Try passing IPv4 preference as a command line argument as the link below suggests. Is yacy accessing WAN through a proxy by any chance? I ran into connection errors (not like like yours) due to improperly configured proxy server.

https://stackoverflow.com/questions/16373906/address-family-not-supported-by-protocol-family-socketexception-on-a-specific

yunseong0986 commented 4 years ago

Thank you, ipsoftdev the problem is solvedwhen turn off the windows firewall