Custom logic for filtering discovered plugins

PaulSchweizer commented 5 years ago

Disclaimer

I only started looking into pyblish two days ago and really liking it! This however means I might not be aware of any existing solutions or concepts that might already handle this request. If they exist I'd be happy about any hint in the right direction.

Goal

Provide a way to use arbitrary, custom logic to influence/filter the list of discovered plugins.

Motivation

When working on mutliple projects at once, each with its own requirements of what and how to publish things, the exisiting way of guiding the flow of publishing through plugin paths, hosts, families etc is not enough and also not really applicable I think. In studios, this problem is usually handled through a system of configuration files that can be specified per project/shot etc.

This feature request would provide an entry point for such a configuration workflow, thus allowing studios to influence the publishing process in whatever way they want.

Suggested implementation

I tested the following one-line approach successfully but maybe there are implications that I am not aware of.

One line change to: https://github.com/pyblish/pyblish-base/blob/master/pyblish/plugin.py#L1330 to just emit a new "pluginsDiscovered" signal:.


def discover(type=None, regex=None, paths=None):

    [...]

    plugins = list(plugins.values())
    sort(plugins)  # In-place

    lib.emit("pluginsDiscovered", plugins=plugins)  # New signal 

    return plugins

Then a user could just modify the list of plugins in place, like this:


def filter_plugins(plugins):

    config = ["MyValidator", "MyExtractor"]

    for plugin in reversed(plugins):
        if plugin.__name__ not in config:
            plugins.pop(plugin.index(plugin))

api.register_callback("pluginsDiscovered", filter_plugins)

I hope that makes sense.

mottosso commented 5 years ago

Hi @PaulSchweizer, welcome to Pyblish and thanks for the kind words!

That's a neat feature; I think typically what folks do is leverage PYBLISHPLUGINPATH or api.register_plugin_path, which works a lot like PYTHONPATH. That way, you can customise your path up-front, and in this way guide discovery.

For example, for per-project or per-shot plug-ins, you could say:

$ export PYBLISHPLUGINPATH=/projects/alita/shots/1000/pyblish_plugins
$ maya

And from there, Pyblish would go and look for plug-ins in this directory, which is relative a given project. You can do this either before or during a run of a DCC.

To customise discovery at run-time, you can either edit that same path via e.g. os.environ["PYBLISHPLUGINPATH"] or you can use pyblish.api.register_plugin_path.

You can find information about it here, and on the forum.

Does that work you think?

tokejepsen commented 5 years ago

Maybe https://api.pyblish.com/pages/targets.html could be used as well?

PaulSchweizer commented 5 years ago

Thanks for the super quick answers and the explanations!

Both approaches, PYBLISHPLUGINPATH and target, do not solve the problem for me however.

Both are basically additive approaches but I want to also be able to exclude/disable plugins based on arbitrary, unpredictable and ad-hoc choices made by supervisors, producers or the needs of a project.

Simple examples: One person might decide that for their project, they only want export alembics while another project might require only fbx exports. Additionally, people will want to further configure how some plugins behave, especially the Extractors. One project might want to export alembic caches with a stepsize of 0.5 while another project might want it to be 0.25.

Situations like these were pretty common in most of my experiences at least, especially if the company is handling multiple types of jobs at the same time and/or has a lot of projects running in parallel.

As said, most pipelines offer some sort of config files as a solution for these problems already. My idea mentioned above would leave all that configuration work in the hands of the studio wanting to adopt Pyblish and would not require any change to the way Pyblish works.

I hope that clarifies what I mean a bit better

tokejepsen commented 5 years ago

One person might decide that for their project, they only want export alembics while another project might require only fbx exports.

There has been a workflow of exporting all the file format so you can decide later down the line whether to use alembics or fbx.

Additionally, people will want to further configure how some plugins behave, especially the Extractors. One project might want to export alembic caches with a stepsize of 0.5 while another project might want it to be 0.25.

To have options for extractor a common approach has been expose settings in the scene. For example in Maya you could have selection sets where the user specifies which meshes to export and what the export settings will be. This is a approach of https://getavalon.github.io/2.0/ which takes it further by providing a creator tool for settings up these selection sets.

mottosso commented 5 years ago

One person might decide that for their project, they only want export alembics while another project might require only fbx exports.

Hm, this sounds like a good fit for the per-project plug-ins mentioned above I think; you'd have the fbx-exporter present in one project, and not the other.

An alternative could be to read from the environment at publish time.

class FbxExtractor(...):
  active = os.getenv("FBX_ENABLED", False)

Unlike normal Python modules, Pyblish plug-ins are "reloaded" each time they are used, so this would re-read from the environment every time a publish happens.

Alternatively, you could wrap this up in a configuration system of your own design to manage things more delicately.

import my_pipeline

class AlembicExtractor(...):
  active = my_pipeline.is_enabled("alembic")

Another alternative is to leverage the collection pass and families for this (like Toke mentioned above).

class FbxExporter(...):
  families = ["fbx"]

class AlembicExporter(...):
  families = ["abc"]

class MyCollector(...):
  def process(self, context):
    if my_pipeline.is_enabled("alembic"):
      context.create_instance("myInstance", family="alembic")
    else:
      context.create_instance("myInstance", family="fbx")

However, Pyblish is all about "data-driven" pipelines, in that the data can help determine what to do next.

class MyCollector(...):
  def process(self, context):
    for node in cmds.ls(type="objectSet"):
      if not cmds.hasAttr(node + ".pyblishInstance"):
        continue
      instance = context.create_instance(node)
      instance.data["families"] = [cmds.getAttr(node + ".family")]

In this example, data from the artists scene determines whether an instance should be of a abc or fbx family. You could then take this further, and make a bool attribute the user could use to adjust this interactively from his scene.

instance = cmds.createNode("objectSet")
cmds.addAttr(instance, ln="useAlembic", dt="bool")

That you could then read in a similar fashion from the collector.

mottosso commented 5 years ago

And yes, exactly. The concept extends to end-user interfaces like the one in Avalon, here: https://getavalon.github.io/2.0/tools/#creator

PaulSchweizer commented 5 years ago

Thanks again for the explanations and recommendations, much appreciated! All these approaches are 100% valid and have their use cases. It entirely depends on how the individal pipelines are built, maintained and used. The recommended workflows do have some problems though which is why I am suggesting this enhancement. And just to be clear, I am not advocating for the config approach, everyone should build a pipeline the way they see fit and everyone has different requirements, I would just like to be able to use the "filtering" approach with pyblish due to the specific requirements I currently have and I think it'd be a small enough and non-intrusive change leaving pyblish exactly the way it is while enabling this concept for people who want to use it.

Some thoughts on your recommendations:

There has been a workflow of exporting all the file format so you can decide later down the line whether to use alembics or fbx.

This would be a waste of time, disk space (and farm resources) and could easily be avoided by explicitely specifying what to export.

To have options for extractor a common approach has been expose settings in the scene.

While this works, it needs further tooling and special treatment for each application (maya approach won't work in houdini). The mentioned tool Avalon looks great but an established pipeline might not want to adopt it and instead use their own, existing approaches.
Also, what happens when the default options change midway through the project? We'd need to update existing scenes or resolve the scene settings with the global settings every time we publish which brings us back to the initial request of providing the option for arbitrary filtering and initialization.

I think; you'd have the fbx-exporter present in one project, and not the other.

That would mean having to either copy or symlink plugins onto project locations and then drawing the plugins only from that location. This would also need further tooling to maintain these copies/links. I would use a studio location and additional project/shot/asset locations for the pyblish plugins. These additional locations would be meant for artists/show tds to easily write their own specific plugins without having to go through the usual pipeline version control procedure which can overwhelm them, require further access, require supervision, might clutter the studio repo with code that is only valid for a specific show etc. I don't see the plugin path as a valid option for filtering which plugins to use, but more as a big pool to draw from.

An alternative could be to read from the environment at publish time.

This places reading the config data into the hands of the individual plugins which is what I'd like to avoid. It means either duplicating code or introducing my own subclass that would handle the config system, both approaches are not ideal. This would not be necessary if the control over which plugins are taken into account and what their initial settings are, is offloaded to a simple "filtering process" that runs every time the plugins are gathered. The existing concepts like family and target are great and would still be used in conjunction with this filtering process. The filtering would just allow for arbitrary decisions that can not, or should not be represented in the inherent logic of the publish system. Another benefit is that pipeline can easily offload certain decisions to the project supervisors, just letting them edit the config file to their liking without having to manage any code. Having an independent filtering process would also make it more feasible to use existing plugins as their behavior would be controlled from outside and they would not need to be adopted to fit a specific pipeline (not always applicable of course).

However, Pyblish is all about "data-driven" pipelines, in that the data can help determine what to do next.

This makes sense, but it also means having to embed a specific workflow into the system itself. In the past, I encountered situations where that approach was just not enough to satisfy the arbitrary, un-predictable and ad-hoc requirements imposed on these kinds of systems by the productions.

Again, this approach is a perfectly valid one and we'd be making use of it, just that we'd like to also utilize the mentioned filtering approach for the various reasons mentioned.

mottosso commented 5 years ago

Ok, I can see what you mean.

The reason I'm hesitant isn't so much the implementation (on the contrary, it's quite elegant, nice work) but whether it would split usage and documentation into two different ways of achieving the same goal. In this case, you'd be using a data-driven framework in an imperative way; telling Pyblish what to do, rather than having the data do it. It's backwards. But I can also see how it's more familiar.

So with that in mind, I'd be happy with the feature but treat is as a gateway to a data-driven publishing pipeline with regards to to documentation and guidance.

Here's what I would ask for its implementation.

Instead of lib.emit use api.register_discovery_filter. Registering things is a common pattern for things in Pyblish, especially for things with side-effects like this one.
Add a few lines to the discover docs with motivation and example. Editing this page updates the website here

Let me know your thoughts.

mottosso commented 5 years ago

@tokejepsen Can you spot any other issues with this? @BigRoy Could I have you take on this too?

tokejepsen commented 5 years ago

Cant see any issues. Good to go :)

mkolar commented 5 years ago

We've actually started looking at possible implementations of this exact feature in our avalon config. So for us and pype.club I can definitely say we'd make huge use of this.

antirotor commented 5 years ago

I am already working on it - testing right now. I've implemented it with api.register_discovery_filter way with one slight modification that I think is useful. Filter callback will return Tuple:

def my_plugin_filter(plugin):
    filtered = False
    if plugin.__name__ == 'SomePluginNameIDontWant':
        filtered = True

   if plugin.__name__ == 'SomePluginINeedToChange':
      plugin.optional = True

   return plugin, filtered

That should remove SomePluginNameIDontWant from registered plugins and make SomePluginINeedToChange optional.

PaulSchweizer commented 5 years ago

treat is as a gateway to a data-driven publishing pipeline with regards to to documentation and guidance.

Thanks for the clarification @mottosso, I fully agree.

And great that this is already in development @antirotor , that looks exactly like what I'd need. Let me know if you need any help

mottosso commented 5 years ago

with one slight modification

That's a good idea, but a little too specific. There's no reason Plugin.optional should get special treatment; might as well allow for any attribute to be edited this way, like in @PaulSchweizer original example.

def my_discovery_filter(plugins):
  for plugin in plugins:
    plugin.optional = random.choice([False, True])
    plugin.name += "Filtered"
    plugin.active = False
    plugin.families += ["filterFamily"]

api.register_discovery_filter(my_discovery_filter)

Since it's running in-place, to remove plug-ins you would explicitly need to .remove them.

plugins.pop()

# Or..
plugins.remove(plugins[0])

# Or..
for p in list(plugins):
  if p.name == "Bad":
    plugins.remove(p)

What do you think?

antirotor commented 5 years ago

There's no reason Plugin.optional should get special treatment

I agree, that was just for example. You can do in that filter callback whatever you need.

Since it's running in-place, to remove plug-ins you would explicitly need to .remove them.

This now works per plugin. So it will create list of plugins passing the filters, overriding existing one:

filtered_plugins = {}
    for name, plugin in plugins.items():
        modified, filtered = filter_plugin(plugin)
        if not filtered:
            filtered_plugins[name] = modified

    plugins = list(filtered_plugins.values())

mottosso commented 5 years ago

This now works per plugin

Mm, I understand, but I think Paul's approach would be a better fit here.

E.g.

def discover(...):
  # do all discovery first

  for filter_ in _registered_plugin_filters:
    filter_(plugins)

  return plugins

What do you think?

antirotor commented 5 years ago

Honestly, I don't know. My approach saves iterations for each filter and that is its only advantage. Disadvantage is rigid callback definition that is difficult to enforce in python. I suppose there won't be too many cases where we'll have tens of filters registered, running them on hundreds of plugins so I think performance issue is minor thing here. So I'll cowardly let the decision on you, having no problem to do it the other way :D

mottosso commented 5 years ago

Cool, let's go with Paul's approach. Discovery is not performance critical, and internal loops (even in the thousands) fade in comparison to the otherwise hefty I/O calls it makes to disk and network, which in turn fades in comparison to what the plug-ins are doing at run-time. Would you like to update your PR?

BigRoy commented 5 years ago

Just to add to the discussion, about how to filter data-driven. I had once implemented this:

An is_compatible method for actions, etc. that would return a bool on whether it would be shown/run. Basically the plug-in would just get that method triggered, the default plug-in would always return True as it would rely solely on pyblish's built-in filtering... but to customize the behavior for a plug-in one could do:

def is_compatible(self, instance):
    if instance.data["ignore_uvs"]:
        return False

    return True

That way the filtering is data-driven and directly inside the plug-in.

Of course, this does not allow what Paul describes as his need - to alter the behavior outside of the plug-in. So, "In project X disable this plug-in when X or Y happens". To do that with this technique one would still need to adapt this plug-in's is_compatible method.

However, if you'd still want do it the other way around one could hack their way into it.

def is_compatible(self, instance):
    return pipeline.check_plugin_filter(self, instance)

And if you want that for all your plug-ins you could just inherit from one base class that has that is_compatible method implemented like that in the way you'd like it in your pipeline.

Of course, for speed of processing it would trigger once after "collectors" to update the state of the UI and then only once just prior before it gets processed to ensure it still needs to be processed.

With this method the UIs could also handle correctly hiding those that are not to be processed visually.

PaulSchweizer commented 5 years ago

Thanks @antirotor for the implementation! And thanks to all the other participants for the ideas and discussions. This has been an important enhancement for us and we'll soon be making use of this in our pipeline!

hannesdelbeke commented 3 years ago

adding link to thread with similar discussion for people who find this issue in the future https://forums.pyblish.com/t/pyblish-workflow-manifest/685/3 it's a tool that piggybacks on the filtering functionality

load in a pipline from a config
create new configs where you edit plugin settings and enable disable them

pyblish / pyblish-base

Custom logic for filtering discovered plugins #343