Open gerrod3 opened 1 year ago
Separate from he performance issues with using generic relations for CreatedResources, they also seem to be frequently invalidated when resources are cleaned up, but the records themselves are not removed. The result is that created_resources
for historical tasks is often populated with one or many null values, which is not ideal.
After #3712 this behavior is also inconsistent depending on which resource is involved, which makes it particularly not ideal. Invalid hrefs are probably ultimately "better"
And another thought: Tasks usually do something that has a result, which may or may not be represented in created_resources
. At least once a task creates two, the task reader has do do extra work to extract the relevant information. I'd prefer, if a task could "just" or additionally return a result (JSONField).
OTOH, we need to keep in mind that created_resources
are currently used to cleanup resources in case of a task failure. Maybe we should keep a separate record for that. It could still be the GenericRelation. If we do not expose it in the api the querying is non-problematic, but the task can be much more intentional about what to clean up on failure. And it should be cleared after a successfull run too.
If we add a new JSONField, and start populating it, then code that relies on the current implementation can change-over-time. If we were to just change the existing implementation, it would potentially break all the things. Let's keep that in mind as we design the right fix for the problem.
I'm not sure there's much if any code that relies on the current implementation. There's code that creates CreatedResources
of course, but it would probably be possible to write a shim for that before we transition.
There is certain amount of zero downtime upgrade thinking to be done here.
From task query optimization PR: https://github.com/pulp/pulpcore/pull/3712:
created_resources currently uses a GenericReleationModel to store references to items created during a task. This can cause significant performance issues when quering tasks especially for relations that require Master-Detail related lookups. Also, when the associated resource gets deleted the task is left with a reference to null when serialized.
The logic around created_resources could be simplified if switched to a ArrayField or HStore, there would be no need for complicated lookup logic when quering tasks. Things to note for the conversion:
CreatedResource(resource).save()
,