neoave / mrack

Multicloud use-case based multihost async provisioner for CIs and testing during development
Apache License 2.0
11 stars 14 forks source link

fix: Replace coroutines with tasks to avoid RuntimeError #253

Closed dav-pascual closed 1 year ago

dav-pascual commented 1 year ago

Fixing code block in _opentack_gather_responses method to avoid RuntimeError when a coroutine is reused in the loop.

Including a unit test for this method as part of the commit.

The issue was that a coroutine (aka the async function calls parameters) cannot be reused with await or gather, hence when a ServerError exception was raised, it caused a RunTime error. A workarround for this is awaiting using tasks instead of couroutines directly, and recreate the tasks if necessary to call the coroutines again.

pvoborni commented 1 year ago

I think this patch doesn't solve the issue. The bug seems to be introduced in https://github.com/neoave/mrack/commit/8ef831297a99c47eb270a6ea4f5e33ed20c8caee The regression is that on re-try the functions are not called again.

The whole idea is that on server error we call openstack again to get the data to try to avoid the ServerError.

dav-pascual commented 1 year ago

I think this patch doesn't solve the issue. The bug seems to be introduced in 8ef8312 The regression is that on re-try the functions are not called again.

The whole idea is that on server error we call openstack again to get the data to try to avoid the ServerError.

@pvoborni I will review this further with @Tiboris but, what was preventing to re-call the functions was the incorrect use of asyncio.gather which cannot be reused with coroutines (async function calls), hence provoking the RunTime error after the first failed iteration of the while loop (in _openstack_gather_responses) and preventing the retries.

To put it more simple, when we get a ServerError, we cannot call again asyncio.gather directly with the parameter functions, because that will cause a RunTime error. Instead we use asyncio tasks which will allow us to call the OpenStack functions (or whatever we receive as parameter) correctly, allowing us to retry.

I think this is what you were referring to.

pvoborni commented 1 year ago

I don't think I was referring to that.

This method is called in e.g.:

_, _, self.limits, _, _ = await self._openstack_gather_responses(
            self._load_flavors(),
            self._load_images(image_names),
            self.nova.limits.show(),
            self._load_networks(),
            self._load_ip_availabilities(),
        )

On server error we need to run again the methods. Resp. it would be good to re-run only the failed one., e.g. self._load_images(image_names). Wrapping a coroutine with task will not accomplish the re-run of the function call which created the coroutine. We probably need to pass the functions and their params to _openstack_gather_responses so that inside it can be re-run again.

We can also say that the created test is not testing the scenario correctly. There is no re-run of initial function. Proper test would have e.g. 2 calls of coro2 inside the _openstack_gather_responses where one would end with ServerError and second would succeed thus the function would return result[2] with proper result.