yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.38k stars 425 forks source link

yacy debian https no proxy crawl triggered + existing CA access denied on keystore #126

Open vcarluer opened 7 years ago

vcarluer commented 7 years ago

Hello,

I had an issue while trying to set my pkcs12 certificate.

I have followed this: http://www.yacy-websuche.de/wiki/index.php/En:HOWTO_make_YaCy_allow_SSL_connections =>Using a CA Cert or other authority cert

To make it works I had to remove values for keyStore and keyStorePassword because there was an access denied issue on defaults/freeworldKeystore in logs.

It was set on port 8091. Now I can access https://MYDOMAIN:8091 and the connection is secured.

Bit I still cannot set my browser proxy for https on 8091 port: firefox says the connection is not secured. Only my http proxy on port 8090 works but it does not trigger crawling on https sites... So it is quite useless because most of them are on https.

If I try to configure https proxy on port 443 it disable itself in web gui (bug?). I can activate it again. But I have a SSL_ERROR_RX_RECORD_TOO_LONG error in firefox.

Don't know if all this is linked but you have all my tries to make it work! Help is welcome because https proxy automatic crawling is a great feature I think.

Thank you.

luccioman commented 6 years ago

Hello @vcarluer , sorry for the delayed answer, I wonder which YaCy proxy are you trying to use?

In transparent proxy mode, I am not sure how we could properly trigger indexing on https resources, as when opening SSL/TLS connections trough a transparent proxy, the proxy itself only "sees" a "tunnel" without access to plain-text data exchanged on it.

With the URL proxy mode, https resources can indeed trigger indexing even if you browse your peer trough http, but there are many other possible factors that can prevent any proxied resource to be indexed (cookies presence, cache rules...).

JeremyRand commented 6 years ago

YaCyIndexerGreasemonkey is another approach to solving this. Only issue is that Greasemonkey isn't well-supported by browsers anymore. I'm intending to port it to a WebExtension (it's mostly ready), but I haven't yet gotten around to releasing it.