juju / python-libjuju

Python library for the Juju API
Apache License 2.0
60 stars 100 forks source link

upgrade_charm does not work for ceph-mon #955

Closed gabrielcocenza closed 1 year ago

gabrielcocenza commented 1 year ago

Description

I've used upgrade_charm for several charms and this error happened just with ceph-mon:

Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "<console>", line 1, in <module>
  File "/home/ubuntu/charmed-openstack-upgrader-1/.venv/lib/python3.10/site-packages/juju/application.py", line 759, in refresh
    response = await resources_facade.AddPendingResources(
  File "/home/ubuntu/charmed-openstack-upgrader-1/.venv/lib/python3.10/site-packages/juju/client/facade.py", line 486, in wrapper
    reply = await f(*args, **kwargs)
  File "/home/ubuntu/charmed-openstack-upgrader-1/.venv/lib/python3.10/site-packages/juju/client/overrides.py", line 96, in AddPendingResources
    reply = await self.rpc(msg)
  File "/home/ubuntu/charmed-openstack-upgrader-1/.venv/lib/python3.10/site-packages/juju/client/facade.py", line 659, in rpc
    result = await self.connection.rpc(msg, encoder=TypeEncoder)
  File "/home/ubuntu/charmed-openstack-upgrader-1/.venv/lib/python3.10/site-packages/juju/client/connection.py", line 652, in rpc
    raise errors.JujuError(result['response']['error']['message'])
juju.errors.JujuError: while adding pending resource info for "alert-rules": bad resource metadata: bad info: bad metadata: resource missing filename

Urgency

Blocker for our release

Python-libjuju version

2.9.44.1

Juju version

tried on 2.9.43 and 2.9.45

Reproduce / Test

# Deploy a bundle with ceph-mon channel octopus/stable on focal then use python-libjuju < 3.0:

python3 -m asyncio

from juju.model import Model
model = Model()
await model.connect_current()
ceph_mon = model.applications["ceph-mon"]
await app.upgrade_charm(channel="octopus/stable")
cderici commented 1 year ago

This looks like a duplicate of #883.

@hmlanigan do I remember correctly that you mentioned this is most likely a data problem?

gabrielcocenza commented 1 year ago

I've tried on keystone and the problem does not show like on ceph-mon

cderici commented 1 year ago

I've tried on keystone and the problem does not show like on ceph-mon

This contradicts with @agileshaw's comment: https://github.com/juju/python-libjuju/issues/883#issuecomment-1730441820, despite the controller versions match. I'm at the airport right now with limited time, I'll try to reproduce these scenarios as soon as I can :+1:

gabrielcocenza commented 1 year ago

This same error was found on glance-simplestreams-sync too

cderici commented 1 year ago

960 closes this.

gabrielcocenza commented 1 year ago

@cderici I tried running the 2.9 branch on glance-simplestreams-sync and I'm receiving the following error:

File "/snap/charmed-openstack-upgrader/x1/lib/python3.10/site-packages/juju/application.py", line 735, in refresh
    for res, filename_or_rev in arg_resources.items():
AttributeError: 'NoneType' object has no attribute 'items'
cderici commented 1 year ago

@gabrielcocenza sorry to hear that, sounds like I introduced a bug there by mistake. I'll certainly give it a closer look at it tomorrow and patch it quickly for you πŸ‘

cderici commented 1 year ago

I tried running the 2.9 branch on glance-simplestreams-sync and I'm receiving the following error:

@gabrielcocenza I tried to reproduce but failed. I deployed glance-simplestreams-sync on microk8s at channel='yoga/stable', revision=91 and upgraded to channel="xena/stable", revision=97 and it seems to be working (it also doesn't make too much sense to me how that arg_resources could be None, but apparently it is in your case).

Can you give me the channels/revisions you're upgrading from and to so I can track it down in your exact scenario? Thanks!

gabrielcocenza commented 1 year ago

Hello @cderici. On my environment I've deployed glance-simplestreams-sync on victoria/stable at a mass cloud

cderici commented 1 year ago

Hello @cderici. On my environment I've deployed glance-simplestreams-sync on victoria/stable at a mass cloud

Ok victoria/stable revision 96, and which channel/revision you're trying to upgrade to when you get the error above?

gabrielcocenza commented 1 year ago

which channel/revision you're trying to upgrade to when you get the error above?

I was trying to refresh to the same channel victoria/stable as a way of guarantee that it was at the latest version of this track. Once it failed I refresh using the juju cli and then refresh to wallaby/stable

Interesting is that I've deployed glance-simplestreams-sync on a lxd cloud and the bug didn't show up. I'll let you know if I encounter this error again going from wallaby to xena

cderici commented 1 year ago

@gabrielcocenza yeah just tried it on microk8s (deployed victoria/stable and refreshed on the same channel), seems to be working:

$ python -m asyncio
asyncio REPL 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Use "await" directly instead of "asyncio.run()".
Type "help", "copyright", "credits" or "license" for more information.
>>> import asyncio
>>> from juju import model;m=model.Model();await m.connect()
>>> await m.applications['glance-simplestreams-sync'].refresh(channel='victoria/stable', revision=96)
>>> await m.applications['glance-simplestreams-sync'].refresh(channel='victoria/stable')
>>>

I also tried going from wallaby/stable (rev 94) to xena/stable (rev 97) and it seems to be working as well.

I'll let you know if I encounter this error again going from wallaby to xena

Let me know if you either bump into an error or confirm the bug is fixed so we can close this πŸ‘ Thanks!

gabrielcocenza commented 1 year ago

@cderici the CI got the same issue as I described :grimacing:

cderici commented 1 year ago

@cderici the CI got the same issue as I described 😬

@gabrielcocenza no worries, I can fix this quickly even without reproducing. But I wanna make sure we understand why's this happening so we won't accidentally introduce more elaborate issues there.

From the CI output I see the main runner is the test_upgrade_charm in the charmed-openstack-upgrader, but I can't tell what exact charm/channels/revisions are used, can you either find that out by pdb or point me to the code that's providing the parameters for the test case that's running (and failing)? Thanks!

gabrielcocenza commented 1 year ago

@cderici I discovered the problem. We were following the documentation which uses resources=None but on branch 2.9 uses an empty dict. When we override with None this problem shows up.

Are you going to switch to default value as None or should we use empty dict?

https://github.com/juju/python-libjuju/blob/39279699b7fcaaaf30e892c003c1e9d7c3c0aab2/juju/application.py#L623

cderici commented 1 year ago

@gabrielcocenza oh I'm sorry, yeah I changed the default value on that to resources={} because the resources=None doesn't make too much sense. The documentation for that will be changed when we actually make a release with that change, until then it's just a change in the edge. I may

Yeah I'd suggest to use resources={} if you need to pass it during a call to refresh.

Though I'm glad to hear it sounds like the actual fix is working for you, isn't it? Let me know if so so we can close this issue πŸ‘

gabrielcocenza commented 1 year ago

Hi @cderici the bug seems to be fixed, so I'm closing it.

Thanks