pyblish / pyblish-base

Pyblish base library - see https://github.com/pyblish/pyblish for details.
Other
127 stars 59 forks source link

Pyblish profiling #348

Open antirotor opened 5 years ago

antirotor commented 5 years ago

Goal

It would be nice to have mode, that would generate profiling data for pyblish process and plugins. Producing lot of timing and other data, that can be later used for visualizing and debugging whole session.

Motivation

My idea is to capture whole context after each plugin is done as snapshots along with it's timing, memory consumtion, etc. to json. Then to have simple application processing this json file, drawing all data to canvas, so we'll be able to vizualize, what is going on with all data during publishing. This could be useful when debugging behaviour of pipelines where lot of plugins are doing many things.

Implementation

We could use similar way already used all over pyblish - use registered callbacks to be run after each plugin and use python's cProfiler or other stuff to measure.

mottosso commented 5 years ago

Thanks @antirotor this sounds like a good and relatively easy thing to implement.

import json
from pyblish import util
context = util.publish()

with open("profile.json", "w") as f:
  json.dump(f, context.data["profiling"])

I would have a look at building this into the main process() function; this is what standalone and GUIs calls to actually process a plug-in/instance pair. There could be e.g. a start/end time before and after that function finishes.

The data itself could be stored in e.g. context.data["profiling"].

Then we could potentially also have some means of deep profiling by the plug-in author calling some method from within the plug-in, that could then be collected alongside the global time for the entire plug-in.

class MyPlugin(api.InstancePlugin):
  def process(self, instance):
    # do something heavy
    self.profile("somethingHeavy", time.time())
    # do something else
    self.profile("somethingElse", time.time())

The profile() function would then add a child entry to the overally plug-in time. It'd need a more suitable name than self.profile() and probably wouldn't need to take in a time, but you get the idea.

For memory it would probably be sufficient to look at the overall consumption of the Python process itself; since it's unlikely that anything else happens during publishing. At least for an overall take on it. Then there is potentially room to monitor the context and instance objects themselves, and measure their size over time. Speaking of which, you probably also want to store clock-time alongside each process() and "deep" profile call, to build some sort of graph (with e.g. pygal) afterwards. That's something that could potentially be shipped alongside Pyblish, to generate SVG graphcs out-of-the-box. Something that could potentially also be included in the GUIs too.

Would you like to have a look at setting up a PR for this to get things rolling, @antirotor?

antirotor commented 5 years ago

Yes, I was thinking about similar way. There are few points to resolve though:

mottosso commented 5 years ago

best way to enable profiling: either by setting env variable PYBLISH_PROFILING and --profiling option for CLI

For plug-in and clock timing, there wouldn't be much of an overhead and we could probably leave it on per default. For capturing memory, I don't expect there to be much overhead, but it would be good to measure this. So maybe as a first step, implement the no-overhead profiling information and establish some output, and then have a look at what kinds of things to measure that would have an impact on run-time performance as a step 2.

possibility to add profiler vizualizing application

The pygal library I mentioned should make for a good start. It's pure-Python and could be embedded with pyblish-base, and makes interactive graphs out of .svg files. That way we could leverage the web browser to draw it for us initially, and then eventually embed something like that into a GUI.