yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.38k stars 425 forks source link

HTTP/S Proxy support with Yacy Server #36

Open sudheesh001 opened 8 years ago

sudheesh001 commented 8 years ago
http_proxy = 
https_proxy =

might be the environment variables already present in the system, there should be a way to tell Yacy to use internet behind a proxy environment or respect the http_proxy settings.

maxolasersquad commented 6 years ago

This would be useful for me as well. I am running yacy behind an nginx proxy and cannot fully contribute as a senior node.

maxolasersquad commented 6 years ago

Hmmm, when I first setup my server the above was true. A few hours later it was showing itself to be running in senior mode, so I guess it's working fine behind the nginx proxy.

luccioman commented 6 years ago

@maxolasersquad correct me if I am wrong, but as I understand it, @sudheesh001 was talking about proxying outgoing http connections from YaCy (acting as a client, when crawling for example), whereas you are describing your YaCy peer as running behind nginx acting as a reverse proxy/gateway (receiving incoming connections first and transmitting them to your Yacy peer). This would not be exactly the same configuration issues...

maxolasersquad commented 6 years ago

I think you are correct. When my server was running in junior mode I assumed it was because of the nginx proxy and went searching the issues. I originally thought this was it. Since my server began running in senior mode after a while its clear that yacy had no issue running behind an nginx server.

sudheesh001 commented 6 years ago

Absolutely @luccioman . In this issue, I was basically referring to the ability for the yacy clients to communicate when behind a proxy by enabling the http_proxy and https_proxy environment variables.

Thank you @maxolasersquad for taking your time to deploy the server behind an nginx proxy. Please feel free to share the nginx configuration as an update to the documentation for running Yacy peers for others in the community who might want to setup Yacy behind an Nginx proxy. 👍

ehehdada commented 5 years ago

Hello!

I have also installed YaCy 1.92 docker version listening at port 8090 behind of my nginx, public https port 443 (http port 80 redirects to https), and looking for the right way to configure YaCy. In Use Case & Accounts -> Basic Configuration -> point 4, whatever I put as port is reverted to 8090 after clicking on Set Configuration button. The SSL certificate is at the nginx, so I am connecting nginx and YaCy through http port.

If I change the ports at System Administration -> Advanced Settings then, after restart, I am unable to access YaCy and I even have to remove the volume and deploy it again (yes, I updated nginx conf to point port 80 in YaCy.)

Enabling Transparent Proxy Access Settings seems to have no effect.

Any guidelines since the wiki is still offline? getting Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 823296 bytes) in /usr/www/users/hostkb/domain/wiki.yacy/includes/db/DatabaseMysqli.php on line 43.

Thank you in advance!

luccioman commented 5 years ago

@ehehdada the fact that you can not modify the listening port to something else than 8090 is indeed a limitation specific to the YaCy Docker image which is configured to expose only the 8090 and 8443 ports. That's right that it should be better documented. By the way, currently if you want to expose your YaCy Docker container to other port values, you have to use the relevant Docker binding ports parameters, for example -p 80:8090 -p 443:8443 instead of -p 8090:8090 -p 8443:8443. Then if you want your server to be properly seen by other YaCy peers you may also need to adjust the staticIP field in the /Settings_p.html?page=ServerAccess admin page.

normen commented 3 years ago

I am trying to expose the yacy server through an nginx proxy. Basically my whole web servers front end is nginx and it routes the requests for single (sub)domains to different docker containers. This works fine for nextcloud and others but while yacy works it complains about not being able to connect to other yacy servers. When I set the actual "world" port numbers 80/443 then yacy won't start..

normen commented 3 years ago

No dice, I had to make a hard route from the internet to the yacy docker container. If anyone can tell me how to configure yacy to work behind a nginx proxy I'm all ears.

virtadpt commented 3 years ago

No dice, I had to make a hard route from the internet to the yacy docker container. If anyone can tell me how to configure yacy to work behind a nginx proxy I'm all ears.

Wish I could, I had to do the same thing.

Orbiter commented 3 years ago

It should work this way:

However, this does not re-route the port number.

Right now the demo peer at https://yacy.searchlab.eu/ appears as senior but is doing it differently

The same should be possible for YaCy in docker containers, just expose their port to the docker host and then set the staticIP inside the container to the host IP address.

normen commented 3 years ago

Static IP is set. As said, when routing the ports to any other outside ports then yacy can't federate, when I expose the default ports to the world it works... Gonna re-check my firewalls etc.

lfuelling commented 2 years ago

Most other web applications I've admistered offered a setting like "public URL" or something like that where I could set the full public URL (protocol, host and port), where YaCy only offers that staticIP setting.

It's really weird that YaCy only lets you change the domain part and expects the port to always be the same one it listens on.

luntik2012 commented 2 years ago

solved that:

server {
        listen 80;
        return 301 https://$host$request_uri;
    }

    server {

        listen 443;
        server_name my.domain.name;

        ssl_certificate /etc/letsencrypt/live/my.domain.name/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/my.domain.name/privkey.pem; # managed by Certbot

        ssl on;
        ssl_session_cache  builtin:1000  shared:SSL:10m;
        ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
        ssl_prefer_server_ciphers on;

        access_log            /var/log/nginx/yacy.access.log;

        location / {
            proxy_set_header        Host $host;
            proxy_set_header        X-Real-IP $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        X-Forwarded-Proto $scheme;

            # Fix the “It appears that your reverse proxy set up is broken" error.
            proxy_pass          http://10.8.0.5:8090;
            proxy_read_timeout  90;

            proxy_redirect      http://10.8.0.5:8090 https://my.domain.name;
        }
    }
    server {

        listen 8090;
        server_name my.domain.name;

        access_log            /var/log/nginx/yacy.access.log;

        location / {
            proxy_set_header        Host $host;
            proxy_set_header        X-Real-IP $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        X-Forwarded-Proto $scheme;

            # Fix the “It appears that your reverse proxy set up is broken" error.
            proxy_pass          http://10.8.0.5:8090;
            proxy_read_timeout  90;
        }
    }

yacy:

Sytem Administration->Server access settings:

Use case & Accounts:

UPD (2023-03-09): this still works, but it takes 10-15 minutes for the yacy web interface to show the senior status.

Doomsdayrs commented 9 months ago

From what I see, #440 does nothing for the issue, specifically in a container.

I place in port 443 (https only), but it keeps expecting from port 8090.

okybaca commented 9 months ago

you cannot bind to 443 port under unix/linux, unless you're root. that's just a remark, not sure, if it's related to your problem.

nginx config was discussed in the forum as well

Doomsdayrs commented 9 months ago

you cannot bind to 443 port under unix/linux, unless you're root. that's just a remark, not sure, if it's related to your problem.

Nope, YaCy was configured to use port 8090. The public port was 443 under server access settings.

nginx config was discussed in the forum as well

They didn't "solve" the issue. They simply exposed port 8090 to the outside.

lfuelling commented 9 months ago

I place in port 443 (https only), but it keeps expecting from port 8090.

The port setting just changes the port in the config, you have to manually bind a proxy (like nginx) to that port and forward/pass traffic to 8090.

From what I see, https://github.com/yacy/yacy_search_server/pull/440 does nothing for the issue, specifically in a container.

What it does is allow you to have a proxy (like nginx) handling the public port with TLS and then forward to 8090 (the port yacy is listening on) internally. This is regardless of the deployment method, you can also do this using containers.

As can be seen in the help text:

The publicPort can help that your peer can be reached by other peers in case that your peer is behind a reverse proxy. If the port used to access YaCy is the same port the application is listening on, you don't need to set anything here, please leave it blank.

So if you want to have NGINX handling your TLS stuff and users expect to type "https://somedoiman.com" into the browser and see yacy, you'd have to make NGINX do a proxy_pass to port 8090, and then set the public port to 443 so the users won't be redirected to 8090.

Doomsdayrs commented 9 months ago

I place in port 443 (https only), but it keeps expecting from port 8090.

The port setting just changes the port in the config, you have to manually bind a proxy (like nginx) to that port and forward/pass traffic to 8090.

From what I see, #440 does nothing for the issue, specifically in a container.

What it does is allow you to have a proxy (like nginx) handling the public port with TLS and then forward to 8090 (the port yacy is listening on) internally. This is regardless of the deployment method, you can also do this using containers.

As can be seen in the help text:

The publicPort can help that your peer can be reached by other peers in case that your peer is behind a reverse proxy. If the port used to access YaCy is the same port the application is listening on, you don't need to set anything here, please leave it blank.

So if you want to have NGINX handling your TLS stuff and users expect to type "https://somedoiman.com" into the browser and see yacy, you'd have to make NGINX do a proxy_pass to port 8090, and then set the public port to 443 so the users won't be redirected to 8090.

I am aware of this, and this is what I've done.

Apache Proxy directs port 443 -> local host:23243

23243 is the port exposed by the container, internally that maps to 8090.

And it didn't work.

Avamander commented 8 months ago

I get the same impression that it's not possible to properly define an external accessible IP and port for all four variations - IPv4, IPv6, HTTP and HTTPS.

hurricanefrog commented 6 months ago

Can someone give me a hint, I probably missed it in the discussion. How can I crawl with YaCy when I am behind a proxy? The environment variables http_proxy and https_proxy are set, but YaCy doesn't seem to use these, (I am running YaCy in Docker container, if I podman exec -ti yacy_search_server /bin/sh and do a curl http://google.com there, curl can successfully fetch the page. YaCy just does a "Connection Refused" when starting a crawl.

I can't find the proxy setting (Connect through proxy, I don't want to use the YaCy proxy itself) in the admin settings anywhere.

hurricanefrog commented 6 months ago

I can't find the proxy setting (Connect through proxy, I don't want to use the YaCy proxy itself) in the admin settings anywhere.

Thanks to https://wiki.yacy.net/index.php/De:YaCy-Tor#Konfigurationsdateien_.C3.A4ndern I found the respective settings (remoteProxyUse, remoteProxyHost, remoteProxyPort) . Really curious there's no "obvious" way to set it via the config UI.

okybaca commented 6 months ago

Hi, huricanefrog, and thanks for feedback!

I'll include that in the FAQ. Do you think the text:


How can I crawl with YaCy when I am behind a proxy?

Set up proxy settings in configuration file DATA/SETTINGS/yacy.conf

    remoteProxyUse=true
    remoteProxyHost=localhost  # hostname or address of proxy
    remoteProxyPort=8118       # proxy port

would be sufficient?

hurricanefrog commented 6 months ago

@okybaca I actually found it's also adjustable on http://<host>:<port>/Settings_p.html?page=proxy But yeah, that would certainly help :)

okybaca commented 6 months ago

great, thanks! i'll include the UI settings too.

okybaca commented 6 months ago

added to faq in https://github.com/yacy/yacy_net_homepage/pull/30

seang96 commented 1 month ago

I have been looking at this for a while, it looks like this issue got branched with some other proxy issue, so bringing it back, publicPort doesn't seem to be working properly. I am getting 0 peers connecting to mine with it properly being exposed to 443 and being browsable using a domain and https.

Public Port: 443 Server Port: 8090 Server SSL Port is 8443

It looks like there is documentation for bindPort but it's unused in #242. The only way I can see this working is setting port to 80, ssl pprt to 443, set certs in auth (wouldn't be able to auto renew letsencrypt), then set nginx to use https from yacy.