pulp / pulpcore

Pulp 3 pulpcore package https://pypi.org/project/pulpcore/
GNU General Public License v2.0
304 stars 116 forks source link

Refactor Task's created_resources field to use an ArrayField or HStore #3771

Open gerrod3 opened 1 year ago

gerrod3 commented 1 year ago

From task query optimization PR: https://github.com/pulp/pulpcore/pull/3712:

created_resources currently uses a GenericReleationModel to store references to items created during a task. This can cause significant performance issues when quering tasks especially for relations that require Master-Detail related lookups. Also, when the associated resource gets deleted the task is left with a reference to null when serialized.

The logic around created_resources could be simplified if switched to a ArrayField or HStore, there would be no need for complicated lookup logic when quering tasks. Things to note for the conversion:

dralley commented 1 year ago

Separate from he performance issues with using generic relations for CreatedResources, they also seem to be frequently invalidated when resources are cleaned up, but the records themselves are not removed. The result is that created_resources for historical tasks is often populated with one or many null values, which is not ideal.

After #3712 this behavior is also inconsistent depending on which resource is involved, which makes it particularly not ideal. Invalid hrefs are probably ultimately "better"

mdellweg commented 1 year ago

And another thought: Tasks usually do something that has a result, which may or may not be represented in created_resources. At least once a task creates two, the task reader has do do extra work to extract the relevant information. I'd prefer, if a task could "just" or additionally return a result (JSONField).

OTOH, we need to keep in mind that created_resources are currently used to cleanup resources in case of a task failure. Maybe we should keep a separate record for that. It could still be the GenericRelation. If we do not expose it in the api the querying is non-problematic, but the task can be much more intentional about what to clean up on failure. And it should be cleared after a successfull run too.

ggainey commented 1 year ago

If we add a new JSONField, and start populating it, then code that relies on the current implementation can change-over-time. If we were to just change the existing implementation, it would potentially break all the things. Let's keep that in mind as we design the right fix for the problem.

dralley commented 1 year ago

I'm not sure there's much if any code that relies on the current implementation. There's code that creates CreatedResources of course, but it would probably be possible to write a shim for that before we transition.

mdellweg commented 1 year ago

There is certain amount of zero downtime upgrade thinking to be done here.