hypothetical-inc / GafferDeadline

Deadline Dispatcher for Gaffer
BSD 3-Clause "New" or "Revised" License
37 stars 6 forks source link

Dispatcher batchSize support #73

Open Rapheus opened 7 months ago

Rapheus commented 7 months ago

Hello there,

Here's my issue and some context

I have a renderfarm with only a couple of workers. When I dispatch a script to Deadline, i want that node to render all the frames at once, to save on resources (Opening and closing Gaffer). I use the batchSize to do so, like so:

python_mav0WAV3D1

When Execute using Gaffer's LocalDispatcher, everything works as expected, all frames are written within a single process.

However, if I dispatch using GafferDeadline's Dispatcher, the task completes too quickly due to an early Process Exit Code 0 and only 41 frames out of the 100s are written on a network drive.

mstsc_Adyzk1QWE2

I'm using Gaffer 1.3.1.0 and Deadline 10.3. I would like to know if GafferDeadline extension and/or Deadline support the Gaffer's batchSize option

Thanks a lot for your help

ericmehl commented 7 months ago

Hi @Rapheus, GafferDeadline does support the batchSize option, and it looks like it's getting set correctly on your Deadline job. If it wasn't you see 100 tasks for your job each with a single frame instead of the single task rendering frames 1-100 as you have.

It's odd that it renders some frames but not all of them. I tested this setup here with this script, and it worked as expected :

import Gaffer
import GafferDispatch
import GafferImage
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 1, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 3, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 1, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 0, persistent=False )

__children = {}

__children["ImageWriter"] = GafferImage.ImageWriter( "ImageWriter" )
parent.addChild( __children["ImageWriter"] )
__children["ImageWriter"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Checkerboard"] = GafferImage.Checkerboard( "Checkerboard" )
parent.addChild( __children["Checkerboard"] )
__children["Checkerboard"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["ImageWriter"]["dispatcher"]["batchSize"].setValue( 100 )
__children["ImageWriter"]["dispatcher"]["deadline"]["pool"].setValue( 'none' )
__children["ImageWriter"]["dispatcher"]["deadline"]["secondaryPool"].setValue( 'none' )
__children["ImageWriter"]["dispatcher"]["deadline"]["group"].setValue( 'workstation-fast' )
__children["ImageWriter"]["dispatcher"]["deadline"]["onJobComplete"].setValue( 'Nothing' )
__children["ImageWriter"]["dispatcher"]["deadline"]["dependencyMode"].setValue( 'Auto' )
__children["ImageWriter"]["in"].setInput( __children["Checkerboard"]["out"] )
__children["ImageWriter"]["fileName"].setValue( '${HOME}/Desktop/checker_####.exr' )
__children["ImageWriter"]["__uiPosition"].setValue( imath.V2f( 1.0999999, 2.39999986 ) )
__children["Checkerboard"]["size"]["y"].setInput( __children["Checkerboard"]["size"]["x"] )
__children["Checkerboard"]["__uiPosition"].setValue( imath.V2f( 2.5999999, 10.5640621 ) )

del __children

Can you try pasting that into a new Gaffer script and see how it works? If that's different from the script you are testing, can you share your Gaffer script?

It looks like you are launching your own gaffer.bat wrapper, what is that batch script doing? Can you try with pointing Deadline's Gaffer plugin at the bin/gaffer.cmd launcher directly?

Is there a chance you're running out of disk space, the network or the network drive isn't stable or some other limitation that would prevent all the frames from being saved? I would expect it to error in that case but perhaps something is not triggering an error when it should be.

It might also be worth trying a newer Gaffer version, there have been a number of changes since 1.3.1.0 and I generally test GafferDeadline against the latest Gaffer version.

Rapheus commented 7 months ago

Thanks for your answer,

I will try your suggestions and report back shortly

Rapheus commented 7 months ago

Hey !

Running the original gaffer.cmd file on the farm makes the job work as expected.

The gaffer.bat file that I was using is part of a custom launcher that runs apps through a python process (using subprocess.Popen). It turns out I wasn't capturing the stdout of the process correctly so it would return an early exit code 0.

Thanks a lot for your help, And thanks for your work on the plugin