hep-gc / shoal

A squid cache publishing and advertising tool designed to work in fast changing environments
Apache License 2.0
4 stars 8 forks source link

Shoal-Agent:: unable to find public IP and wrong interface #182

Open adriansev opened 8 months ago

adriansev commented 8 months ago

Hi! I have a frontier installation and i have these errors:

2023-10-17 15:07:18,872 - ERROR - [shoal-agent:261] - Shoal-Agent was unable to find a public IP or external IP for this squid. Please set an external IP in shoal-agent config file.

2023-10-17 17:55:25,947 - ERROR - [shoal-agent:120] - Path '/sys/class/net/enp1s0f0:0/statistics/tx_bytes' does not exist. Please change NIC to monitor in configuration file.

How can i set the external IP and give a list of physical interfaces? Thank you!

colsond commented 8 months ago

Hi there, may i ask what operating system you are running on? I've not seen an error like this before and it would be good for us to try and reproduce it locally so we can make the agent more resilient. Does your machine have multiple network interfaces? Are you running IPv4 or IPv6?

Shoal will attempt to use the python package netifaces to figure out the public and private IPs of the machine so it's surprising it can't find a public or private IP.

adriansev commented 8 months ago

Hi! This in on Centos7, with 2 interfaces, one for public vlan and the other the private vlan. the one in public vlan have both ipv4 and ipv6 ips, and for ipv4 there are from public and private ranges. but trying it out i seen that python36-netifaces was not installed on the system .. was it supposed to/required to be installed?

MarcusEbert commented 8 months ago

Hi, yes netifaces needs to be installed. Not sure why it didn't fail with the import if it wasn't installed.

DrDaveD commented 8 months ago

Adrian is talking about the shoal included in the frontier-squid package. Right, Adrian? There shoal-agent is packaged using pyinstaller, so all the required packages are included with it at compile time. They do not need to be installed separately on the system.

The frontier-squid startup code looks up the external ip from a server (which just finds it out from the TCP socket) and then attempts to set external_ip in shoal_agent.conf, but it does that by editing that setting from the shoal_agent.conf generated by the shoal-agent install and I just discovered that external_ip is no longer present in that file. Also, although I see the shoal-agent python script referring to config.external_ip, it looks like config.py no longer reads that value from the config file but only tries to calculate it. It's nice that it tries to guess, but since the algorithm is apparently not foolproof it seems to me that it was a bad idea to remove the option of setting that value.

adriansev commented 8 months ago

Adrian is talking about the shoal included in the frontier-squid package. Right, Adrian? There shoal-agent is packaged using pyinstaller, so all the required packages are included with it at compile time. They do not need to be installed separately on the system.

Yeah, exactly @DrDaveD !

The frontier-squid startup code looks up the external ip from a server (which just finds it out from the TCP socket) and then attempts to set external_ip in shoal_agent.conf, but it does that by editing that setting from the shoal_agent.conf generated by the shoal-agent install and I just discovered that external_ip is no longer present in that file. Also, although I see the shoal-agent python script referring to config.external_ip, it looks like config.py no longer reads that value from the config file but only tries to calculate it. It's nice that it tries to guess, but since the algorithm is apparently not foolproof it seems to me that it was a bad idea to remove the option of setting that value.

for reference i get this with the curl:

curl -s -m 5 http://wlcg-wpad.cern.ch/fsad.conf
external_ip=2001:b30:4210:1::17
amqp_server_url=shoal.heprc.uvic.ca

but not output with:

curl -vv -4 -s -m 5 http://wlcg-wpad.cern.ch/fsad.conf
* About to connect() to wlcg-wpad.cern.ch port 80 (#0)
*   Trying 128.142.161.84...
* Connected to wlcg-wpad.cern.ch (128.142.161.84) port 80 (#0)
> GET /fsad.conf HTTP/1.1
> User-Agent: curl/7.29.0
> Host: wlcg-wpad.cern.ch
> Accept: */*
> 
* Operation timed out after 5000 milliseconds with 0 out of -1 bytes received
* Closing connection 0

also, i would like that my frontiers to identify themselves primarily by ipv4 i will try the addition of "#external_ip= in template conf and get back here with feedback

adriansev commented 8 months ago

@DrDaveD so with the external_ip in the template conf, now i have in my running conf the external_ip=<host ipv6 ip> declaration ... is there a way to make it use the ipv4? or just to let me customize the configuration?

MarcusEbert commented 8 months ago

The frontier-squid startup code looks up the external ip from a server (which just finds it out from the TCP socket) and then attempts to set external_ip in shoal_agent.conf, but it does that by editing that setting from the shoal_agent.conf generated by the shoal-agent install and I just discovered that external_ip is no longer present in that file. Also, although I see the shoal-agent python script referring to config.external_ip, it looks like config.py no longer reads that value from the config file but only tries to calculate it. It's nice that it tries to guess, but since the algorithm is apparently not foolproof it seems to me that it was a bad idea to remove the option of setting that value.

If there is an external_ip defined in the conf file then it should be used instead of the code trying to find out which one to use; same for the interface. However, by default there is no entry in the conf file; but it can be added if needed.

MarcusEbert commented 8 months ago

Could you please let us know the steps to reproduce the issue using the frontier-squid install?

adriansev commented 8 months ago

so, after adding the #external_ip= entry in shoal_agent.conf.frontierdefault then it will be properly populated in shoal_agent.conf with the external ipv6 (unfortunately, since i would like to use mainly ipv4) So, i no longer have the unable to find a public IP error. My configuration is a Centos7 with 2 physical interfaces, no networkManager (only ifcfg- scripts) with this structure:

4: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 3c:ec:ef:f8:8d:1c brd ff:ff:ff:ff:ff:ff
    inet MY_PUBLIC_IPV4/24 brd 85.120.46.255 scope global enp1s0f0
       valid_lft forever preferred_lft forever
    inet 172.20.0.17/24 brd 172.20.0.255 scope global enp1s0f0:0
       valid_lft forever preferred_lft forever
    inet6 MY_PUBLIC_IPV6/48 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::3eec:efff:fef8:8d1c/64 scope link 
       valid_lft forever preferred_lft forever
5: enp1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 3c:ec:ef:f8:8d:1d brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/22 brd 172.18.3.255 scope global enp1s0f1
       valid_lft forever preferred_lft forever
    inet6 fe80::3eec:efff:fef8:8d1d/64 scope link 
       valid_lft forever preferred_lft forever

where enp1s0f0 is in public vlan (=1) and enp1s0f1 is in the worker nodes vlan

my only remaining error in log is:

cat shoal_agent.log 
2023-10-27 08:43:37,189 - ERROR - [shoal-agent:120] - Path '/sys/class/net/enp1s0f0:0/statistics/tx_bytes' does not exist. Please change NIC to monitor in configuration file.

but i have no idea why is there as, while i do set the above 172.20.0.17 on enp1s0f0 through ifcfg-enp1s0f0:0 i do not see such interface So, as conclusion, i would say this is solved, with the solution being adding #external_ip= entry in shoal_agent.conf.frontierdefault Thanks a lot!

MarcusEbert commented 8 months ago

where enp1s0f0 is in public vlan (=1) and enp1s0f1 is in the worker nodes vlan

my only remaining error in log is:

cat shoal_agent.log 
2023-10-27 08:43:37,189 - ERROR - [shoal-agent:120] - Path '/sys/class/net/enp1s0f0:0/statistics/tx_bytes' does not exist. Please change NIC to monitor in configuration file.

Is the interface specified anywhere in a conf/default file? The interface should probably be 'enp1s0f0' instead of 'enp1s0f0:0'.

MarcusEbert commented 8 months ago

@adriansev

so, after adding the #external_ip= entry in shoal_agent.conf.frontierdefault then it will be properly populated in shoal_agent.conf with the external ipv6 (unfortunately, since i would like to use mainly ipv4)

Could you please try to move '/etc/shoal/shoal_agent.conf' to a different directory and then execute the shoal-agent install script: '/usr/local/bin/shoal-agent-installation.sh -b' to see if that creates the right config file?

adriansev commented 8 months ago

Hi! So, the network configuration files look like this:

/etc/sysconfig/network-scripts/ifcfg-enp1s0f0
/etc/sysconfig/network-scripts/ifcfg-enp1s0f0:0
/etc/sysconfig/network-scripts/ifcfg-enp1s0f1
/etc/sysconfig/network-scripts/ifcfg-lo

and yes there are reasons for ifcfg-enp1s0f0:0 instead of adding ips to ifcfg-enp1s0f0

also, the shoal is part of frontier-squid package and there is no shoal-agent-installation.sh @DrDaveD can you tell you more about how shoal is integrated with frontier-squid

DrDaveD commented 8 months ago

so with the external_ip in the template conf, now i have in my running conf the external_ip=<host ipv6 ip> declaration ... is there a way to make it use the ipv4? or just to let me customize the configuration?

If you change the default settings for outgoing connections to prefer ipv4, that should make it see the ipv4 address instead of ipv6. This can be done through settings in /etc/gai.conf.

If there is an external_ip defined in the conf file then it should be used instead of the code trying to find out which one to use; same for the interface. However, by default there is no entry in the conf file; but it can be added if needed.

Ok, so then I must have misunderstood the code, I'm glad that it still works. So we will fix this in the next frontier-squid rpm release.

@DrDaveD can you tell you more about how shoal is integrated with frontier-squid

It is built in the frontier-squid rpm spec file using pyinstaller, and shoal_agent.conf is generated at start time. The shoal-agent code and all its dependencies wrapped by pyinstaller are installed at /usr/libexec/squid/shoal-agent. It does not pull in non-python scripts like shoal-agent-installation.sh.

MarcusEbert commented 8 months ago

Ok, so then I must have misunderstood the code, I'm glad that it still works. So we will fix this in the next frontier-squid rpm release.

We can also just add a commented out entry for external_ip. I think it would be good to have all available config option back in the conf file for reference at least.

@DrDaveD can you tell you more about how shoal is integrated with frontier-squid

It is built in the frontier-squid rpm spec file using pyinstaller, and shoal_agent.conf is generated at start time. The shoal-agent code and all its dependencies wrapped by pyinstaller are installed at /usr/libexec/squid/shoal-agent. It does not pull in non-python scripts like shoal-agent-installation.sh.

Okay, I didn't know how that works. The shoal-agent-installer.sh was introduced a while ago to create a working conf file. I would like to try to reproduce the issue to see where we could make changes to have it working in the way the frontier-squid package expects it. What would I need to do to reproduce the issue? Is it just installing and starting frontier-squid?

DrDaveD commented 8 months ago

We can also just add a commented out entry for external_ip. I think it would be good to have all available config option back in the conf file for reference at least.

That would be good. If you have a new tagged release soon then we'll use that instead of fixing it directly in frontier-squid.

Okay, I didn't know how that works. The shoal-agent-installer.sh was introduced a while ago to create a working conf file. I would like to try to reproduce the issue to see where we could make changes to have it working in the way the frontier-squid package expects it. What would I need to do to reproduce the issue? Is it just installing and starting frontier-squid?

If you did that and enabled the auto discovery as described in the frontier-squid installation instructions then that would be enough to see that external_ip is not set in /etc/squid/shoal_agent.conf. You would have to have a networking setup like Adrian's to see the problem in the description of this issue.

MarcusEbert commented 8 months ago

Thanks @DrDaveD, we'll try that and see that we can have a new version out by the end of next week.