pyblish / pyblish-base

Pyblish base library - see https://github.com/pyblish/pyblish for details.

Other

127 stars 59 forks source link

Node-based workflow #50

Closed mottosso closed 10 years ago

mottosso commented 10 years ago

Branching off of #41 to focus the discussion on the alternative, node-based workflow.

Do you think that it would be possible to implement coral within the tool or would you rather create your own version?

The issue with using Coral as host for our processes is that most of the project is written in C++ and would require us to also provide build instructions for our users, which vary per platform and can get quite lengthy. And as Publish doesn't yet have a need for performance, most of its benefits would go unnoticed.

I've had a quick look at Depends yesterday and it might be a better fit, being pure Python and PySide. It isn't being developed with Windows in mind, but it ran just fine and I implemented a basic "recipe" and node in just an hour or two.

This paper covers most of what it is about.

It does however (as far as I can tell) mainly concern stateless processes - in that each node represents a new process which takes arguments as input and produces results via stdout. For us, this would mean launching a new instance of our host per plugin, as each process is unaware of any other process (hence being "stateless") and could thus not utilise each other's, already running process

tokejepsen commented 10 years ago

Think it is a good direction, mostly because it will make it easy for people to graph their workflow without programming. And it also encourages much more composable code.

So you'll end up with the same validator for Maya as for other software.

This is not vital to the discussion, but this comment from @BigRoy still seems like magic to me:) Other than using a cross DCC language like FabricEngine, I can't see a validator being used cross DCC.

While on the topic of FabricEngine, this new direction is looking a lot like their visual programming; https://vimeo.com/103517340, but more specific to publishing. Just dont want us to reinvent the "new wheel":)

mottosso commented 10 years ago

This is not vital to the discussion, but this comment from @BigRoy still seems like magic to me:) Other than using a cross DCC language like FabricEngine, I can't see a validator being used cross DCC.

Well, consider what information is required by a validator.

Let's start with a simple example, validating naming convention. For this, a validator would only need access to names of nodes. Names are already grabbed from the scene during selection and as such, the validator isn't dependent on anything Maya. This is actually already the case in the current validator for naming convention. As you can see, there are no references to anything related to Maya.

Taking this a step further, what information does a validator need to validate inverted normals? Well, it would need vertex numbers and edge links. (As, technically, vertex numbering and edges are solely responsible for the direction of its corresponding faces, called winding order)

We could hand this information to each instance during selection, just like we are handing it names of nodes. Then, in the validator, instead of referencing the node and its normals information:

if cmds.polyNormal(...) is 'inverted':
    ...

We access it through attributes provided via selection.

if instance.config.get('normals') is 'inverted':
    ...

The same is true for any other validator, even for extractors.

Think about what information is required to extract an obj from a scene.

Vertex data
Edge data
Face data
Basic shading data

It isn't too far fetched for a selector to also provide this information, however it would of course introduce a potential bottleneck in performance.

Stepping back just a bit; even though all validators and extractors could be fed information this way, I don't really see this as a viable alternative. However, it does introduce the possibility to provide basic building blocks as validators that can be chained together with custom validators to fulfill the requirements of each production.

Selectors could, for instance, provide for the following information per host:

Name
Bounding box
Immediate Parent
Immediate Children
Attributes and values

That's information which is available within most hosts and there are quite a lot of validators that could get built upon this information alone. What we'd have to do, is to provide for a selector per host, which we'd do anyway, and fulfill an (optional) interface to provide for this information. Hosts that live up to these requirements are then compatible with the "universal plugins".

mottosso commented 10 years ago

While on the topic of FabricEngine

Hold your horses there, Fabric is cool and all, but we aren't solving on anything already solved by Fabric and piggybacking on it wouldn't get us anywhere nearer our goal. I thought the only reason we're considering Coral and Depends is for their GUI facilities and already solved implementation of a scenegraph as reference for our own, as it's open source and available?

I'd suggest to staying as minimal as absolutely possible until we've got a solid foothold on what we need and how we're using it. At the moment, our needs are far simpler than any of these other implementations.

tokejepsen commented 10 years ago

That's information which is available within most hosts and there are quite a lot of validators that could get built upon this information alone. What we'd have to do, is to provide for a selector per host, which we'd do anyway, and fulfill an (optional) interface to provide for this information. Hosts that live up to these requirements are then compatible with the "universal plugins".

Definitely seems like a nice idea. It just seems like it makes for much less composable code, when it comes to the Selector. But then again we can't entirely escape the hosts specific code, so it will just be a matter of where to take that hit. Using the Selectors might be better, as else the hosts specific code would need to be in both Validators and Extractors. You would need to extend the Selectors, if you had a new Validator that wasn't getting the information provided.

I'd suggest to staying as minimal as absolutely possible until we've got a solid foothold on what we need and how we're using it. At the moment, our needs are far simpler than any of these other implementations.

I agree:)

mottosso commented 10 years ago

But then again we can't entirely escape the hosts specific code, so it will just be a matter of where to take that hit.

Yeah, I think so too.

This is how I see it - the host is going to be involved no matter what, and it can either infect each step, like it is now:

                         ______  
                        |      |
                        | Host |
                        |______|
                           |                                  
     ______________________|____________________________
    |                  |                 |              |
 ___v______      ______v_____      ______v_____      ___v_____ 
|          |    |            |    |            |    |         |
| Selector |--->| Validation |--->| Extraction |--->| Conform |
|__________|    |____________|    |____________|    |_________|

Or its interaction with Publish could start and end with Selection.

    ______  
   |      |
   | Host |
   |______|
      |
      |
      |
 _____v____      ____________      ____________      _________ 
|          |    |            |    |            |    |         |
| Selector |--->| Validation |--->| Extraction |--->| Conform |
|__________|    |____________|    |____________|    |_________|

The latter is of course an ideal and probably less practical, but I would at least aim for that.

BigRoy commented 10 years ago

@mottosso, that's some great clean explanations you added there. I think it's up to the studio to decide what kind of implementations they want to make for their plug-ins (selectors, validators, etc.). But personally I wouldn't convert the DCC's meshes (or other 'complex' data) to an output from the Selector, it doesn't add any real benefit. It's trivial to check normals in Maya, but on your own point cloud you need to know the relevant math. Also to implement your own check in your DCC you can often pick from example code how the check should be performed.

I quickly wanted to mention that Depends seems to use a Python library called networkx to define the node graph and its functionality like traversing the graph. All we would have to do is extend/wrap the functionality that we need for our Dependency Graph. I think it would be a very nice library to use if we go for a node graph! What do you think?

I would also recommend going for a name like DAG over just Graph, because I think it's a familiar term for a lot of people.

mottosso commented 10 years ago

I quickly wanted to mention that Depends seems to use a Python library called networkx to define the node graph and its functionality like traversing the graph. .. I think it would be a very nice library to use if we go for a node graph!

Yeah, could do. The library seems robust and so does its documentation.

For reference, this is what their implementation looks like:

https://github.com/andrew-gardner/dependsworkflow/blob/master/depends_dag.py

mottosso commented 10 years ago

But personally I wouldn't convert the DCC's meshes (or other 'complex' data) to an output from the Selector, it doesn't add any real benefit. It's trivial to check normals in Maya, but on your own point cloud you need to know the relevant math.

Just spit-balling here, but by gathering vertex information and attributes in Selection, we're essentially talking about serialisation. We're serialising contents of a scene, and then making use of the serialised information in our plugins.

If so, then in the far far future, we could take it further and utilise existing methods of serialising a scene, with vertex information and such, like Alembic or USD. We could skip the whole "write to disk" and merely keep it in memory, like we are now, and use it strictly for our plugins. Never actually exposing the fact that Alembic is involved.

At that point, we could build truly software-agnostic plugins that would be usable by any host with an implemented Selector (a.k.a. Serialiser).

BigRoy commented 10 years ago

@mottosso and I are still investigating the route of a node-based graph, upon doing so I wanted to raise the following options:

Single-type single-input-connection node-graph
- Single type of input/output data: the Context data container.
- Somewhat similar to Nuke, Houdini and Fusion (not exactly, because you can't have multiple inputs thus can't merge!)
Multi-type multi-input-connection node-graph
- Multiple types of data (str, list, int) for inputs/outputs that are strictly typed; connections are only allowed between similar types.
- Similar to Maya and Softimage
Single-type multi-input-connection node graph
- Single type of input/output data: the Context data container.
- Similar to Nuke and Fusion
- This is not entirely similar to Houdini because Houdini does have connections that can't be made because of strict typing. Though most of Houdinis behaviour is similar, you can almost always just connect a node to another node (especially it's main port).

Here's some pros/cons I could come up with:

1. Single-connection node graph

This graph is visualized by always having a maximum of one input connection per node, but multiple outputs are allowed. Thus you can perform branching though no merging of branches. The input and output types are always of the same type and act as a data container, think of it as the Context.

Pros

Support for branching in the graph.
Clean and simple overview in the graph
Order of processing is very clear (best option is likely depth-first)
The Context would get a deep copy per branch used to further operate with.
Simple and expected automatic rerouting upon deleting a node in the graph.

Cons

No merging on branches
No easy way of connecting a specific piece of output data of one node to another further down the chain.

2. Multi-connection node graph

This graph is recognized by allowing for and visualizing a higher amount of connections. Not the data container Context is transferred but each single attribute type connects to an input of the same type.

Pros

Support for branching and merging in the graph.
Not always a simple automatic way of adding a node 'in the middle' of other nodes, because many connections are required.

Cons

No easy way of automatic rerouting the graph upon deletion of a node.
Lots of connections need to be made.
Hard to visualize how the graph gets processed in order for complex graphs, because of amount of connections.

3. Single-type multi-connection node graph

This graph is visualized by most nodes having only a single output and input, but the ability of having multiple inputs and outputs (for merging and branching). This allows us to avoid the issue of the single-connection node graph that you can't have merge nodes. The input and output types are always of the same type and act as a data container, think of it as the Context.

Pros

Support for branching and merging in the graph.
The graph stays relatively clean (no clutter of connections)
Order of processing stays relatively lean and clear.
Simple and expected automatic rerouting upon deleting a node in the graph.

Cons

Separating 'data' in the Context becomes unclear by just looking at the graph.
- Example given: one (selector) node outputs the meshes (list of objects) and another (selector) node outputs the cameras (list of objects). Both are lists of objects, how do we now down in the graph what list of objects a node operates on within the Context. We'll need to add dropdown menus (comboboxes) so we can select one of the created inputs that exist on the Context.

mottosso commented 10 years ago

I realised that there is some room for ambiguity about the number of connections per node. For example, having a single output doesn't necessarily mean it can't be plugged into many inputs and thus not facilitate branching.

Let me explain.

Single In, Single Out

single-in-single-out_small

This, similar to the SOP context within Houdini, or the majority of nodes in Nuke, only allows a single input and a single output, but the output can be plugged into multiple inputs on other nodes.

To us, this could mean plugging the output of SelectObjectSet into ValidateNamingConvention into ExtractAsMa. The would each take what they give; a context. This would certainly be a convenient and easily looked-upon layout.

In Python, it could look like this:

def single_input(input):
     return input + 1

Multi In, Single Out

multi-in-single-out_small

Similar to the above, but allowing multiple inputs. Merge is a good example of where this is useful.

def merge(a, b):
     return a + b

Off the top of my head, I'm unable to see any of our nodes being mergeable, @BigRoy what are your thoughts on this?

Single In, Multi Out

single-in-multi-out_small

Now we're getting complicated. Consider the equation x + y + z = a. It takes three inputs - x, y and z - and produces a single output - a. Then consider the function:

def add(x, y, z):
    return x + y + z

Again, three inputs and one output. This is probably what we're most familiar with.

Multiple outputs on the other hand:

def advanced_func(x):
     return [y, z]

To be honest, I'm having trouble imagining we'd ever get into a position where this is necessary. Maya does it so it's certainly not unheard of. But it is rare.

Multi In, Multi Out

multi-in-multi-out_small

Similar to the above, but probably more common and the complexity added by multiple inputs is slight.

To your points.

Support for branching in the graph.

Branching would be a really great feature to have I think, and is possibly the thing separating nodal workflows, like Maya, from linear workflows, like After Effects. I think branching should be possible with any of these connectivity options.

Order of processing is very clear (best option is likely depth-first)

Interesting choice of depth-first, I would actually go the other way and say breadth-first. Consider the following graph:

                      -- ValidateA --
                     /               \
SelectInstances -------> ValidateB -------> ExtractAsMa
                     \               /
                      -- ValidateC --

Depth-first would mean to run SelectInstances, followed by ValidateA followed by ExtractAsMa. I think we would expect all validations to complete, before running extraction.

The Context would get a deep copy per branch used to further operate with.

That's an interesting point. I imagined the context to remain the same shared object throughout, but deep copying is probably unavoidable. Consider the following graph:

                      -- ValidateA --
                     /               \
SelectInstances -------> ValidateB -------> ExtractAsMa
                     \               
                      -- FilterSelection --> ValidateC --> ExtractAsObj

If FilterSelection alters the Context, say it removes a few instances, then it would have a side-effect on the context as it entered into ExtractAsMa. Thus, it would need it's own copy of the context.

Is it possible that each node will have to get their own individual deep-copy?

Separating 'data' in the Context becomes unclear by just looking at the graph. Example given: one (selector) node outputs the meshes (list of objects) and another (selector) node outputs the cameras (list of objects). Both are lists of objects, how do we now down in the graph what list of objects a node operates on within the Context. We'll need to add dropdown menus (comboboxes) so we can select one of the created inputs that exist on the Context.

This may not be necessarily true. If two selectors follow each other, I think it would be reasonable to expect that the output from the last node to have performed both operations, thus outputting both cameras and meshes. Did I get this right?

ljkart commented 10 years ago

If so, then in the far far future, we could take it further and utilise existing methods of serialising a scene, with vertex information and such, like Alembic or USD. We could skip the whole "write to disk" and merely keep it in memory, like we are now, and use it strictly for our plugins. Never actually exposing the fact that Alembic is involved.

At that point, we could build truly software-agnostic plugins that would be usable by any host with an implemented Selector (a.k.a. Serialiser).

@mottosso : this is a nice idea, because these are standard formats and widely used.

@BigRoy : Nice write up and really helpful to understand the real basic. I have a doubt that do we have to stick on one type of node-graph to define our system, it should be according to the type of our node that we have to use. eg: selectors , it will use a file to parse and select the item and give the output to validators.

--(file)--->selectors---(filtered file)-->validators.

Correct me if I am wrong!

tokejepsen commented 10 years ago

This is just me throwing stuff out there, but Gaffer might be an option as well for an interface; http://imageengine.github.io/gaffer/

tokejepsen commented 10 years ago

From the discussion in #59 I got thinking about how Conformers would work with this. Conformers would need specific information from upstream like files, user data etc. With the node based workflow would a Conformer needing two data attributes (filePath and userData), have two inputs, similar to a merge node in Nuke? The Conformer node wouldn't be able to work unless the inputs were supplied. Think this is the Multi-in, Single-out I'm referring to.

mottosso commented 10 years ago

That is certainly one way of doing it.

The other way, which resembles what we spoke of in #59 is for each plugin to append data to the current "stream of information", that is, the Context. This is similar to how tools like Houdini works; all along the way, vertices pass through each SOP node. A SOP node can modify, remov or append information, such as vertices, vertex colors or velocity etc. The information can then be used by subsequent nodes in the chain.

The benefit of this workflow is less spaghetti wires between nodes and greater encapsulation of data; in Houdini, there is almost always a single connection between nodes, as opposed to Softimage or Maya where there is one connection per "channel" of information. The disadvantage however is that the information isn't as obvious, as it is in Maya.

In Houdini, this is remedied by having a really good inspector window for what data is present within each mesh, each face, each edge and each vertex.

It's a question of how much information should be represented by the graph, and how much should be represented by surrounding tools, such as the inspector window of Houdini for more complex data.

mottosso commented 10 years ago

Forgot to add that what I'm referring to, the Houdini-style, is Single-in, Single-out.

mottosso commented 10 years ago

Here's some good reference from all node-based GUIs in the world.

This is my favourite. https://www.youtube.com/watch?v=yuR1e1PjU8Y

mottosso commented 10 years ago

Saving images and illustrations here: https://github.com/abstractfactory/pyblish/wiki/Flow-Graph