uyuni-project / uyuni

Source code for Uyuni
https://www.uyuni-project.org/
GNU General Public License v2.0
440 stars 185 forks source link

proxy metadata caching uses wrong "max" time #8189

Open jmozd opened 10 months ago

jmozd commented 10 months ago

Problem description

When updating clients after generating new channel metadata on the Uyuni Server, the clients still are presented the previous meta data for a long time.

In Uyuni Proxy /etc/squid/squid.conf, there already is an according comment regarding this symptom, but the following confuguration statements for "refresh_pattern" don't match the comment:

# cache repodata only few minutes and then query parent whether it is fresh
refresh_pattern /XMLRPC/GET-REQ/.*/repodata/.*$ 0 1% **1440** reload-into-ims refresh-ims
refresh_pattern /ks/.*/repodata/.*$ 0 1% **1440** reload-into-ims refresh-ims
# salt minions get the repodata via a different URL
refresh_pattern /rhn/manager/download/.*/repodata/.*$ 0 1% **1440** reload-into-ims refresh-ims
# bootstrap repos needs to be handled as well
refresh_pattern /pub/repositories/.*/repodata/.*$ 0 1% **1440** reload-into-ims refresh-ims
refresh_pattern /pub/repositories/.*/venv-enabled-.*.txt$ 0 1% **1440** reload-into-ims refresh-ims
[---]
# rest of tftp are config files prone to change frequently
refresh_pattern /tftp/.*$ 0 1% **1440** reload-into-ims refresh-ims
refresh_pattern         .               0       100%    525600

The Squid documentation is ambiguous on the units for "Max", older documentation containedthe information that "min and max are specified in MINUTES". Current docs http://www.squid-cache.org/Doc/config/refresh_pattern/ :

usage: refresh_pattern [-i] regex min percent max [options]

    By default, regular expressions are CASE-SENSITIVE.  To make
    them case-insensitive, use the -i option.

    'Min' is the time (in minutes) an object without an explicit
    expiry time should be considered fresh.
[...]
        'Max' is an upper limit on how long objects without an explicit
    expiry time will be considered fresh. The value is also used
    to form Cache-Control: max-age header for a request sent from
    Squid to origin/parent.

So setting the max time to 1440 minutes is asking for a 24 hour max time.

While the max age is always a trade-off between recurring downloads and expected life-time of the upstream data, especially channel meta-data is expected to be quickly available after refreshing channels on th eUyuni Manager.

I propose to change the according values to 5 minutes.

Steps to reproduce

1. 2. 3. ...

Uyuni version

Uyuni-2023.12

Uyuni proxy version (if used)

Uyuni-2023.12

Useful logs

No response

Additional information

No response

rjmateus commented 10 months ago

Hey,

I tested it locally with a fresh deployment of the RPM based proxy and I was unable to reproduce it. As soon as the new metadata was generated (repo metadata), the files were invalidated in the proxy and the new packages where available for the minions.

Could you provide more insides about your test scenario? Did you use multiple proxies in cascade?

jmozd commented 10 months ago

Hi Ricardo,

our test scenario indeed consists of cascaded proxies:

The test client was Cobbler-installed and it was noticed that packages were missing, because of an (per admin error) empty child channel. On SUSE Manager Server, the empty channel was then populated with packages and the meta data rebuild completed.

The test client did not receive the new meta data, even per explicit "zypper ref --force". Checking the Squid log in the containerized proxy, no requests showed, so we concluded the downstream proxy's Squid responded to the client request by answering with the cached data.

After reducing the max TTL entries in squid.conf entries (on both the downstream proxy and the containerized proxy) and restarting each Squid, the next "zyyper ref -f" on the clients lead to according queries (monitored on the container proxy Squid only) and the channel content was available to the client.

I haven't checked - but may the behavioral mismatch be caused because of some missing headers (refresh timeout) when requesting the data from Squid? That would explain why our downstream Squid didn't feel compelled to even ask for an update, while in your test case, the (only) Squid involved did have the original Uyuni Server response header at hand and therefore did re-request the meta-data from the Server.

rjmateus commented 10 months ago

than kyou for the explanaiton. Would be possible to run the test with just one proxy between the minion and the server? You can easily change the proxy in the webUI of the minion detail under "Connection" then the link "Change" (Is not super visible).

If the problem doesn't shows up with one proxy layer then the problem is on the data the proxy is providing. Changing the default value for cache can hit large scale deployments, and I would prefer to solve the root cause of the file not beeing detected as changed.

jmozd commented 10 months ago

We'll come up with a test, but I'll have to check with the team regarding the time line. Hopefully, I'll have results until tomorrow EBD.

jmozd commented 10 months ago

interim feedback: The problems seems to be more complex than initially thought: We could not reproduce with simple channel operations (create empty child channel, add to client and refresh/prove it's empty, update the channel, refresh and check contents from client's point of view - this works as intended!) and will have to come up with a test mimicing the original behavior (installation of a system via cobbler incl accessing the empty channel, onboard with the same channel, then filling the channel and checking access from the client). That may take a few more days.

rjmateus commented 10 months ago

Thank you for the feedback. Did you maid this test with a chained proxy, or a single proxy between minion and server? With this feedback looks like the issue should not be directly in the proxy caching time, but something more fishy.

jmozd commented 10 months ago

The tests were done with a proxy chain and we "followed the white rabbit", meaning we could track the update requests from client to its immediate proxy (classic) to the central proxy (containerized) to the central Manager Server. We used the original max time settings in both proxies.

I do not yet understand why the original tests did not show the same (correct) behavior, but as these were started via a fresh Cobbler install (which also accesses the "child channel" in question, though via the cobbler URL), I asked the team to set up an according autoinstallation chain to reproduce the original installation attempt (with an empty child channel) and then only later fill it with content and try to access that from the client. I'll get back with the results, but as this test needs more setup than the simple "extra test channel tests" from yesterday, and we need a free slot in the test env, it may be a couple of days to relay the findings.

rjmateus commented 10 months ago

thank you for the update @jmozd. I'm looking forward for the results :+1: