hep-gc / shoal

A squid cache publishing and advertising tool designed to work in fast changing environments
Apache License 2.0
4 stars 8 forks source link

Shoal configures CVMFS with a malformed proxy list #137

Closed rptaylor closed 6 years ago

rptaylor commented 6 years ago

The shoal command to configure the proxy list should be executed like this:

cvmfs_talk proxy set "http://kraken01.westgrid.ca:3128;http://atlascaq3.triumf.ca:3128;http://atlascaq4.triumf.ca:3128"

When I test this on a current CernVM instance I confirm that 'sudo cvmfs_talk proxy info' shows that the correct configuration has been applied.

On the same VM, running 'shoal-client' produces these log messages which appear okay:

Oct 24 21:26:22 host-10-39-17-214 shoal-client: Setting "http://kraken01.westgrid.ca:3128;http://atlascaq3.triumf.ca:3128;http://atlascaq4.triumf.ca:3128;http://192.168.0.32:3128;http://206.167.181.94:3128;http://kraken01.westgrid.ca:3128;http://cernvm-webfs.atlas-canada.ca:3128;DIRECT" as proxy
Oct 24 21:29:51 host-10-39-17-214 shoal-client: CVMFS proxies set to "http://kraken01.westgrid.ca:3128;http://atlascaq3.triumf.ca:3128;http://atlascaq4.triumf.ca:3128;http://192.168.0.32:3128;http://206.167.181.94:3128;http://kraken01.westgrid.ca:3128;http://cernvm-webfs.atlas-canada.ca:3128;DIRECT"

However, querying CVMFS shows that shoal-client has misconfigured the proxy URLs. Note the first URL is invalid due to "http://" appearing twice, and a rogue quotation mark appearing.

atlas.cern.ch:
Load-balance groups:
[0] http://"http://kraken01.westgrid.ca:3128 (:unresolved:, -6m)
[1] http://142.90.110.68:3128 (atlascaq3.triumf.ca, +2h)
[2] http://142.90.90.61:3128 (atlascaq4.triumf.ca, +2h)
[3] http://192.168.0.32:3128 (192.168.0.32, +23h)
[4] http://206.167.181.94:3128 (206.167.181.94, +23h)
[5] http://206.12.48.249:3128 (kraken01.westgrid.ca, +21m)
[6] http://142.90.110.68:3128 (atlascaq3.triumf.ca, +2h)
[7] http://DIRECT" (:unresolved:, -6m)
Active proxy: [1] http://142.90.110.68:3128

This is with the recent shoal-client-0.6.4-1.el6.noarch

rptaylor commented 6 years ago

Here are the client errors that result. Note that the quotation mark at the end of DIRECT is also problematic. This does not happen when executing cvmfs_talk proxy set on the command line with quotation marks, so maybe the problem is related to python popen?

Oct 24 16:54:47 host-10-39-17-214 cvmfs2: (atlas.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:54:50 host-10-39-17-214 cvmfs2: (atlas-condb.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:54:50 host-10-39-17-214 cvmfs2: (atlas-condb.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:54:53 host-10-39-17-214 cvmfs2: (atlas-nightlies.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:54:53 host-10-39-17-214 cvmfs2: (atlas-nightlies.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:54:57 host-10-39-17-214 cvmfs2: (grid.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:54:57 host-10-39-17-214 cvmfs2: (grid.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:55:00 host-10-39-17-214 cvmfs2: (sft.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:55:00 host-10-39-17-214 cvmfs2: (sft.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:55:02 host-10-39-17-214 cvmfs2: (atlas.cern.ch) switching proxy from http://"http://kraken01.westgrid.ca:3128 to http://142.90.110.68:3128
Oct 24 16:56:25 host-10-39-17-214 cvmfs2: (grid.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:56:25 host-10-39-17-214 cvmfs2: (grid.cern.ch) switching proxy from http://"http://kraken01.westgrid.ca:3128 to http://142.90.110.68:3128
rptaylor commented 6 years ago

This affects the regular CVMFS client repositories, but not the CernVM OS client in the microkernel image (sudo cvmfs_talk -p /mnt/.rw/cache/cernvm-prod.cern.ch/cvmfs_io.cernvm-prod.cern.ch proxy info).

colsond commented 6 years ago

This code hasn't been changed since you implemented it but I suspect the problem is when the formatted string is created: cvmfs_http_proxy = "\"" + env_proxy + default_http_proxy + "\"" then this string is used in the cvmfs talk call: p = Popen(["cvmfs_talk", "proxy", "set", cvmfs_http_proxy], stdout=PIPE, stderr=PIPE)

It looks like quotes were added to the string to make it match the format of the others but isn't necessary. My guess is that when it looks at the first entry the string starts with " instead of http:// so it prepends it.

I'll cut a new version with the quotes removed and send it your way.

rptaylor commented 6 years ago

I've never written any shoal code but it may well be that there wasn't enough testing and the bug has been present for awhile.

It is really weird that the string printed in the log is correct.

colsond commented 6 years ago

Yes you have :P

But regardless I think the string that is being printed in the log is in fact the same one used. As you can see above your log statements have the string wrapped in quotes which is something a normal log wouldn't do. Since the quotes are actually part of the proxy string I'm fairly certain that is the origin of our problem.

rptaylor commented 6 years ago

Ha, fair enough, that was a long time ago. But I believe I tested those changes; I recall this related issue: https://sft.its.cern.ch/jira/browse/CVM-987 I think we would have noticed this bug if it was present then. Also, the problem started happening on the 18th when the client was updated with the new Frontier functionality. Anyway it doesn't matter where the bug came from, just need to find it. Unfortunately it seems that removing the quotes didn't solve the issue. It seems like the string http:// is being prepended to the actual cvmfs_http_proxy variable.

colsond commented 6 years ago

Fixed as of commit 6f964a52754e60aaa1013dc0be3fc8054ba81f76.