General: Unify profile filter criteria in code

BigRoy commented 2 years ago

Description

Profile filtering is done in a myriad of locations in the code and in some areas I think can be unified to avoid logic differences and errors.

As an example:

In subset_name_profiles the last part of the family name is used.
In the current integrator (before refactor) the full family name is used - same as in Collect Ftrack Family
The Collect Ftrack Family plug-in uses api.Session["AVALON_APP"] for the hosts filter criteria whereas the Integrator uses context.data["host_name"]
The filter criteria also came up on this PR just now which recommends using context.data["hostName"] which however CAN be different from api.Session["AVALON_APP"] which as listed above is potentially also wrongly used in some areas of the code?
The Collect Ftrack Family plug-in filter criteria doesn't expose the task_type entry in the filter criteria even though the Admin settings does expose it. This seems like a bug which I think can be traced back to not having a clear definition and entry point for these filter criteria.

I'm aware the filter_profiles is also used in other ways with other filter criteria. However it seems like a bunch of areas use the same filter criteria but seemingly with little consistency? At this point in time I'm mostly targeting those filtering criteria using

Even better if we could also unify this in Admin Settings with its own Profile Filter Schema for the settings - so that if we change the schema and the function all areas support the extra filtering features, like how it appears "Task Type" was added at a later state but not correctly implemented in some areas?

Describe the solution you'd like

Preferably expose a get_profile_filter_criteria method that clearly describes what it expects as input values. Or find another way to ease profile filtering in the many different locations in the code base.

In the Integrator Refactor I tried to use the simplest data source that is accessible that had the 'right' formatting compared to what the old integrator was doing and turn that into get_profile_filter_criteria and made it use only the parsed anatomyData. Not saying that's the good way and I'd definitely love to include that in this Issue/discussion.

Additionally I think we can get by by having a single collect_profile_filter_criteria Collector so that other plug-ins could even use the cached instance.data["profile_filter_criteria"] or alike. Not sure if better but at least it'd mean consistent across the publishing workflow.

Nonetheless I feel the urge to just have an easily accessible and consistent way to create the filter criteria would be beneficial.

[cuID:OP-2989]

kalisp commented 2 years ago

'instance' passed into get_profile_filter_criteria is to have more options where to get information for filling the criteria? (You could have there only instance.data["anatomyData"] as an argument, but I could see advantage to not having change method signature in the future, just I don't particularly like inserting everywhere instance - for one it easier to prepare anatomyData content for testing than instance etc.)

BigRoy commented 2 years ago

I don't particularly like inserting everywhere instance

Totally agree. Best argument signature would allow us to easily test and recreate even outside Pyblish plugins, etc. with the smallest yet easiest data set possible. Whether that's anatomy data or lower level like asset entity, task name and primary family name (or families).

Even anatomy data preparation appears to be quite massive in CollectContextAnatomyData and CollectInstanceAnatomyData so recreating anatomy data might also be non trivial outside those plugins.

iLLiCiTiT commented 2 years ago

Profile filtering is done in a myriad of locations in the code and in some areas I think can be unified to avoid logic differences and errors.

I would say that it must be unified how the data are stored on instance and force to do it that way first. For example each instance must know it's asset and task, and empty task on instance should be considered as it does not belong under any task and must not look in context for task. Until all collectors are changed that instance.data contains it's context asset and task it's pointless to make unified way how the data are used. The same should be done for context, right now we're accessing AVALON_ASSET and AVALON_TASK during publishing but it should look into context.data for those values.

In subset_name_profiles the last part of the family name is used.

This is caused by limitations of creator system when farm instance has family farm.render instead of render. Without being able to be 100% sure that the family is without "additional" parts it must be done that way or change all places where it could happen as first step and then change it there.

The Collect Ftrack Family plug-in filter criteria doesn't expose the task_type entry in the filter criteria even though the Admin settings does expose it.

True that is probably a bug

The filter criteria also came up on https://github.com/pypeclub/OpenPype/pull/2969#discussion_r838257439 just now which recommends using context.data["hostName"] which however CAN be different from api.Session["AVALON_APP"] which as listed above is potentially also wrongly used in some areas of the code?

That is because PR adding the plugin was created at the same time when context.data["hostName"] was added. Should be changed.

In the Integrator Refactor I tried to use the simplest data source that is accessible that had the 'right' formatting compared to what the old integrator was doing and turn that into get_profile_filter_criteria and made it use only the parsed anatomyData. Not saying that's the good way and I'd definitely love to include that in this Issue/discussion.

I'm afraid I don't agree, at least not at this stage. Each plugin should know what he needs, right now it's not unified. We should first make sure that data on instances are ready for this before adding function that would do that and even with that plugin should have ability to decide what data are used and how. Also there are cases when these data change during publishing and it's not possible to do it only during collection because they may change, that probably should not happen but that's reason why I told many times that we have to add new integrator so we know what has to be changed and where without changing current integrator, because that just cannot be tested in one PR.

BigRoy commented 2 years ago

This is caused by limitations of creator system when farm instance has family farm.render instead of render. Without being able to be 100% sure that the family is without "additional" parts it must be done that way or change all places where it could happen as first step and then change it there.

I understand there must be a reason for splitting - I'm just pointing out that it'd be best to take a generic approach and always do it the same unless we explicitly don't want that. So if always splitting to the last part is the most sensible. Sure.

I'm afraid I don't agree, at least not at this stage. Each plugin should know what he needs, right now it's not unified. We should first make sure that data on instances are ready for this before adding function that would do that and even with that plugin should have ability to decide what data are used and how. Also there are cases when these data change during publishing and it's not possible to do it only during collection because they may change, that probably should not happen.

All I'm saying is we would need a trivial entry point. Like @kalisp points out we'd even rather not rely directly on the instance data since it's so non-transparent. If the same profile criteria are used in multiple locations where they should behave the same I'd prefer finding a way where we can make sure they stay consistent across.

but that's reason why I told many times that we have to add new integrator so we know what has to be changed and where without changing current integrator, because that just cannot be tested in one PR.

Best to move this point to the Integrator PR if relevant to the implementation.

iLLiCiTiT commented 2 years ago

So if always splitting to the last part is the most sensible. Sure.

It's not sensible at all :D But we don't have a better way...

All I'm saying is we would need a trivial entry point.

Something like this?

def prepare_publish_filter_data(template_data, family, host_name, subset_name):
    task_info = template_data.get("task") or {}
    return {
        "families": family,
        "hosts": host_name,
        "subsets": subset_name,
        "task_names": task_info.get("name"),
        "task_types": task_info.get("type")
    }

BigRoy commented 2 years ago

Something like this?

Exactly like that - as long as we don't have to do any complex massaging to the input data. E.g. if I have to go through huge loops to get template_data or family as input correctly for this function to work I'm still worried there'll be logic errors outside of it.

Do note however: I believe the filter data is also used to define the subset names so I don't think subset can be an input for these. :) In the example links in the first comment in this PR I link to code that solely returns:

   filtering_criteria = {
        "families": family,
        "hosts": host_name,
        "tasks": task_name,
        "task_types": task_type
    }

BigRoy commented 1 month ago

Not sure how much is still problematic in AYON - but will transfer it to discuss.

iLLiCiTiT commented 1 month ago

My personal opinion is that DRY can hurt sometimes, especially when we did split addons. I would say it is much safer and better to create the data at the place where is used instead of trying to abstract it.

ynput / ayon-core

General: Unify profile filter criteria in code #827