Open berendt opened 1 month ago
can you send me detailed instructions please which version i need to pick for the manager etc. and maybe "release upgrade notes" ?
Once i have the infos i get it rolling
can you send me detailed instructions please which version i need to pick for the manager etc. and maybe "release upgrade notes" ?
I'll prepare it and put it in here. It will be in the middle of the week.
@maliblatt We'll use this issue for the 2024.1 deployment & upgrade test results.
Set following parameters in environments/manager/configuration.yml and run make sync afterwards. Commit and push all changes and pull the updated configuration repository on the test cluster. Update the manager with osism update manager as usual. Now you can deploy or upgrade the OpenStack 2024.1 services with osism apply -a upgrade X.
ceph_version: quincy
manager_version: latest
openstack_version: 2024.1
any particular changelogs for latest that i need to take into consideration upgrading from 7.1.0 to latest ?
ceph btw i will not test, all of our (stackxperts) environments as known use external clusters. I dont think there was alot of change with ceph or ?
any particular changelogs for latest that i need to take into consideration upgrading from 7.1.0 to latest ?
So far I have only seen one secret that you have to add when using Skyline (prometheus_skyline_password
).
ceph btw i will not test, all of our (stackxperts) environments as known use external clusters. I dont think there was alot of change with ceph or ?
That‘s fine.
I have updated an old test environment to 2024.1 without any issues at it seems. Will do some more testing in the next week. There are indeed only very few kolla-ansible upgrade notes that I had to take into account. One thing I have to take a closer look into is about designate-sink:
`
The configuration variable designate_enable_notifications_sink has been changed to no, configuring notifications for designate in neutron, nova, and control deployment of designate-sink which is now optional.
Operators who want to keep the previous behavior should set this to true.
`
I hope next week I can give some more infos about my testings.
@maliblatt I think it makes sense to set designate_enable_notifications_sink to true in our defaults to keep the old behavior.
So i installed a completly fresh 2024.1 and also upgraded a 7.1.0 environment. Everything except horizon is at least healthy (no function tests atm) and a few caveeats:
For Horizon i cant figure out why it doesnt work, it complains about memcache i guess:
2024-07-20 13:04:18.001064 /var/lib/kolla/venv/lib/python3.10/site-packages/django/conf/__init__.py:267: RemovedInDjango50Warning: The USE_L10N setting is deprecated. Starting with Django 5.0, localized formatting of data will always be enabled. For example Django will display numbers and dates using the format of the current locale.
2024-07-20 13:04:18.001107 warnings.warn(USE_L10N_DEPRECATED_MSG, RemovedInDjango50Warning)
2024-07-20 13:04:18.122116 /var/lib/kolla/venv/lib/python3.10/site-packages/debreach/__init__.py:6: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
2024-07-20 13:04:18.122145 version_info = version.StrictVersion(__version__).version
2024-07-20 13:04:18.529227 Internal Server Error: /
2024-07-20 13:04:18.529258 Traceback (most recent call last):
2024-07-20 13:04:18.529261 File "/var/lib/kolla/venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner
2024-07-20 13:04:18.529263 response = get_response(request)
2024-07-20 13:04:18.529264 File "/var/lib/kolla/venv/lib/python3.10/site-packages/horizon/middleware/simultaneous_sessions.py", line 30, in __call__
2024-07-20 13:04:18.529266 self._process_request(request)
2024-07-20 13:04:18.529267 File "/var/lib/kolla/venv/lib/python3.10/site-packages/horizon/middleware/simultaneous_sessions.py", line 37, in _process_request
2024-07-20 13:04:18.529269 cache_value = cache.get(cache_key)
2024-07-20 13:04:18.529270 File "/var/lib/kolla/venv/lib/python3.10/site-packages/django/core/cache/backends/memcached.py", line 75, in get
2024-07-20 13:04:18.529271 return self._cache.get(key, default)
2024-07-20 13:04:18.529273 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/hash.py", line 347, in get
2024-07-20 13:04:18.529275 return self._run_cmd("get", key, default, default=default, **kwargs)
2024-07-20 13:04:18.529276 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/hash.py", line 322, in _run_cmd
2024-07-20 13:04:18.529277 return self._safely_run_func(client, func, default_val, *args, **kwargs)
2024-07-20 13:04:18.529279 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/hash.py", line 211, in _safely_run_func
2024-07-20 13:04:18.529280 result = func(*args, **kwargs)
2024-07-20 13:04:18.529282 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/base.py", line 687, in get
2024-07-20 13:04:18.529283 return self._fetch_cmd(b"get", [key], False, key_prefix=self.key_prefix).get(
2024-07-20 13:04:18.529284 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/base.py", line 1133, in _fetch_cmd
2024-07-20 13:04:18.529286 self._connect()
2024-07-20 13:04:18.529287 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/base.py", line 424, in _connect
2024-07-20 13:04:18.529289 sock.connect(sockaddr)
2024-07-20 13:04:18.529290 ConnectionRefusedError: [Errno 111] Connection refused
also it seems reconfigure option or deploy does not copy /opt/configuration/environments/kolla/files/overlays/horizon/custom_local_settings properly to the hosts. Doesnt matter what i change in there, it never ends up on the hosts/containers. But this is expected behavior as this was changed to a new format in kolla recently (need to reflected in the release notes later)
Tried manually to change CACHE location, has also no effect. memcache is running fine. Would love to know to were it tries to connect...
did your horizon work @maliblatt ?
looks to me like it awaits memcached to listen on localhost, if i do a netcat with listenport 127.0.0.1 as it looks like from the base client python of the horizon code i get requests when i try to load horizon
root@ctrl01:/etc/kolla/horizon# nc -l 127.0.0.1 11211
get :1:user_pk_None_restrict
If redirect 127.0.0.1:11211 to the appropriate docker container the horizon-error log just dumps source code for some reason without any real error.
UPDATE:
adding this to _9999-custom-settings.py solves the horizon issue:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
'LOCATION': 'MEMCACHEIP:11211',
},
}
Outside of the above i made most tests, magnum cilium works now (despite stil having issues with vexxhost capi driver, but thats not a osism issue). Skyline has a bug that prevents creation of magnum clusters, it passes the flavor ID instead of the flavor name which magnum expects.
Snapshots, Backups and Backup Restore now works again, Live Migration, Loadbalancers, Image Manager all seems to be in working order.
going to test vpnaas with ovn. But looks like i manually have to play around with ovn-neutron-vpn agent, im reusing it in the metadata container for testing for now:
| 8f06be74-f672-58cc-9f5f-84fdfbd7db10 | VPN Agent | hv01 | nova | :-) | UP | neutron-ovn-vpn-agent |
If i can get this to work we need to build a proper neutron-ovn-vpn-agent container, or is there one already ?
@flyersa I have the same situation with horizon. Last week I did only the update itself, but did not test yet the functionality. But I can confirm the same problem with horizon to connect to memcache:
2024-07-22 06:59:44.426125 Internal Server Error: /
2024-07-22 06:59:44.426149 Traceback (most recent call last):
2024-07-22 06:59:44.426156 File "/var/lib/kolla/venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner
2024-07-22 06:59:44.426162 response = get_response(request)
2024-07-22 06:59:44.426168 File "/var/lib/kolla/venv/lib/python3.10/site-packages/horizon/middleware/simultaneous_sessions.py", line 30, in __call__
2024-07-22 06:59:44.426174 self._process_request(request)
2024-07-22 06:59:44.426180 File "/var/lib/kolla/venv/lib/python3.10/site-packages/horizon/middleware/simultaneous_sessions.py", line 37, in _process_request
2024-07-22 06:59:44.426186 cache_value = cache.get(cache_key)
2024-07-22 06:59:44.426191 File "/var/lib/kolla/venv/lib/python3.10/site-packages/django/core/cache/backends/memcached.py", line 75, in get
2024-07-22 06:59:44.426197 return self._cache.get(key, default)
2024-07-22 06:59:44.426203 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/hash.py", line 347, in get
2024-07-22 06:59:44.426208 return self._run_cmd("get", key, default, default=default, **kwargs)
2024-07-22 06:59:44.426214 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/hash.py", line 322, in _run_cmd
2024-07-22 06:59:44.426220 return self._safely_run_func(client, func, default_val, *args, **kwargs)
2024-07-22 06:59:44.426225 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/hash.py", line 211, in _safely_run_func
2024-07-22 06:59:44.426231 result = func(*args, **kwargs)
2024-07-22 06:59:44.426236 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/base.py", line 687, in get
2024-07-22 06:59:44.426242 return self._fetch_cmd(b"get", [key], False, key_prefix=self.key_prefix).get(
2024-07-22 06:59:44.426248 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/base.py", line 1133, in _fetch_cmd
2024-07-22 06:59:44.426253 self._connect()
2024-07-22 06:59:44.426259 File "/var/lib/kolla/venv/lib/python3.10/site-packages/pymemcache/client/base.py", line 424, in _connect
2024-07-22 06:59:44.426264 sock.connect(sockaddr)
2024-07-22 06:59:44.426270 ConnectionRefusedError: [Errno 111] Connection refused
| 8f06be74-f672-58cc-9f5f-84fdfbd7db10 | VPN Agent | hv01 | nova | :-) | UP | neutron-ovn-vpn-agent |
If i can get this to work we need to build a proper neutron-ovn-vpn-agent container, or is there one already ?
Take a look into https://review.opendev.org/c/openstack/kolla/+/924302 ... It seems that a Dockerfile is already on the way :-)
| 8f06be74-f672-58cc-9f5f-84fdfbd7db10 | VPN Agent | hv01 | nova | :-) | UP | neutron-ovn-vpn-agent |
If i can get this to work we need to build a proper neutron-ovn-vpn-agent container, or is there one already ?Take a look into https://review.opendev.org/c/openstack/kolla/+/924302 ... It seems that a Dockerfile is already on the way :-)
Yes, but the whole kolla-ansible part is still missing before that is merged. IMO this is not realistic as a backport for 2024.1. Not directly, at least.
- designate_enable_notifications_sink true as default would be nice
PR pending. Not quite sure whether we really have this as the default. With Neutron DNS integration, you don't really need it any more.
- gnocchi has no available images ?
Now online.
Yes, but the whole kolla-ansible part is still missing before that is merged. IMO this is not realistic as a backport for 2024.1. Not directly, at least.
i dont think we need todo that and wait til its in kolla completed. But i stil want to test the functionality atm. Its not a big problem i can run it for testing in the metadata agent as it has all the parts also installed anyway. So nothing you need todo here atm.
adding this to _9999-custom-settings.py solves the horizon issue:
CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache', 'LOCATION': 'MEMCACHEIP:11211', }, }
Memcache is only enabled when horizon_backend_database is False. We set horizon_backend_database to True by default. It should not try to reach Memcached at all.
Memcache is only enabled when horizon_backend_database is False. We set horizon_backend_database to True by default. It should not try to reach Memcached at all.
Well i dont have this variable anywhere set and it stil has this issue
Yes, but the whole kolla-ansible part is still missing before that is merged. IMO this is not realistic as a backport for 2024.1. Not directly, at least.
The missing kolla-ansible part: https://review.opendev.org/c/openstack/kolla-ansible/+/924575
This is the default value in Horizon:
SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
'LOCATION': '127.0.0.1:11211',
},
}
SESSION_ENGINE is overwritten by SESSION_ENGINE = 'django.contrib.sessions.backends.db'
by default. I think CACHES
has always to be configured with working Memcached servers and not only when SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
.
Edit: Checked 2023.2. It's the same there. Maybe something in Horizon und Django changed. It works when always setting CACHES
.
so you add the ovn-vpn-agent change now as backport as i see the merge. If the image is available let me know then i test ;)
so you add the ovn-vpn-agent change now as backport as i see the merge. If the image is available let me know then i test ;)
Image + inventory group are available. I have not yet added the part of kolla-ansible as backport.
yeah with manual changes i didnt have much luck, and also replied to the opendev part. But they guy said with the kolla part works, cant see how yet. Its not a major change to backport the ansible part or?
I can confirm with the backport OVN VPNaaS works.
only downside is stil that they til today only support this crappy old outdated PFS groups :/
anything else you want me to test?
only downside is stil that they til today only support this crappy old outdated PFS groups :/
Can https://review.opendev.org/c/openstack/neutron-vpnaas/+/898830 help here?
only downside is stil that they til today only support this crappy old outdated PFS groups :/
Can https://review.opendev.org/c/openstack/neutron-vpnaas/+/898830 help here?
Yes, im following this for a while. I see it got some traction this month again. If they ever decide to add it that will most likely solve it. I think they didnt add it because of some api changes or something which wasnt backwards compatible or something like that.
I want to give also some feedback on the VPNaaS: It also works for me, I could establish an IPsec connection which was created via Horizon to an remote IPsec device. With my old test environment I can not give any Infos about network throughput. I think as soon as we will have the first tagged pre release and we will deploy on our plusserver dev environment we can also give some details about performance etc.
beside that everything seems running smooth :-)
I noticed some issue with magnum which may be by default in the images but breaks it.
apparently there is both, installed magnum-capi-helm and magnum-cluster-api but it should only be magnum-cluster-api. I think this are two different implementations of the magnum capi driver. the magnum-capi-helm one is the one from stackHPC which isnt really working very good, and also requires additional work before use on the capi k8s cluster. The other one is the one from Vexxhost which works without any modifications.
Having both on the same time triggers funny behaviors that sometimes it uses one or the other. magnum-capi-helm as example requires kube_version set on images, the magnum-cluster-api does not. So sometimes it triggers errors, sometimes not.
Also delete will sometimes work and sometimes will not. I think we should only include magnum-cluster-api and skip the helm one.
The Vexxhost driver also has a much better tech approach then the stackHPC one which has alot of own depedencies to stackHpC repositories on github which is not really something we should have.
with only the vexxhost driver and using cilium this pretty much works flawless including rolling upgrades, autoscaling and co.
magnum-with-capi.pdf Thank you @berendt .
There is stil one issue with magnum now with SQLalchemy which is being adressed right now, maybe there is a fix available before release later.
https://bugs.launchpad.net/magnum/+bug/2067345
also can we maybe add a default override if magnum_enabled is true ?
in order to work properly with normal users outside of the admin role, nova needs a special policy to allow members to create zero disk flavors (nature of some SCS flavors).
os_compute_api:servers:create:zero_disk_flavor: "role:admin or role:member"
needs to be set in nova policy.yaml otherwise normal member users cannot spawn capi instances.
I also added some tiny documentation on how to make CAPI work atm with magnum to this reply. Maybe helps someone, also if this doesnt really belong to this topic.
For the horizon issues (related to magnum) we will try to create a patch for openstack.
btw. i also tested your horizon CACHE changes, horizon deployment works now as expected.
btw there are again newer capi drivers from vexxhost available. Way to go should be to include the "latest" available when building the images.
We have Magnum rolled out now on multiple customers with manual fixes on 2023.2 and it works very good.
btw there are again newer capi drivers from vexxhost available. Way to go should be to include the "latest" available when building the images.
We have Magnum rolled out now on multiple customers with manual fixes on 2023.2 and it works very good.
We install the latest available magnum-cluster-api package from Pypi in the latest 2023.2/204.1 Magnum container images. Those images are rebuild every night at the moment. I think the problem is that no never release is available @ Pypi: https://pypi.org/project/magnum-cluster-api/. Lastest release from 19. July 2024 there. I would prefer to not install/use the main branch from https://github.com/vexxhost/magnum-cluster-api.
{% set magnum_base_additional_pip_packages = [ 'magnum-cluster-api' ] %}
dragon@testbed-manager:~$ docker run --rm -it quay.io/osism/magnum-api:2023.2 pip3 list | grep magnum-cluster-api
magnum-cluster-api 0.21.2
dragon@testbed-manager:~$ docker run --rm -it quay.io/osism/magnum-api:2024.1 pip3 list | grep magnum-cluster-api
magnum-cluster-api 0.21.2
So the only show stopper is the SQLalchemy problem they introduced with 2024.1, cant find any other bug report then the launchpad one. But its unusable in 2024.1 in current state.