hypothetical-inc / GafferDeadline

Deadline Dispatcher for Gaffer
BSD 3-Clause "New" or "Revised" License
36 stars 6 forks source link

errors not finding there way into logs #30

Closed mrpurest closed 3 years ago

mrpurest commented 3 years ago

Hi,

I don't know if this is related to my deadline config, but I found that in most cases errors were not in my deadline logs.

I commented this line out: (line 93 Gaffer.py) self.AddStdoutHandlerCallback(".ERROR : .").HandleCallback += self.HandleGafferError

and now its working better ( I actually can see whats going wrong on my jobs ) seems like this line was causing the task to end as soon as the error happened, but before the details got logged.

the tasks still error out using the return code - so seems like this line is not needed anyway.

ericmehl commented 3 years ago

Hmm, I haven't been able to recreate that here.

If you paste in the script below, does it miss the logs?

import Gaffer
import GafferDispatch
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 0, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 60, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 7, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 0, persistent=False )

__children = {}

__children["PythonCommand"] = GafferDispatch.PythonCommand( "PythonCommand" )
parent.addChild( __children["PythonCommand"] )
__children["PythonCommand"]["dispatcher"]["deadline"]["environmentVariables"].addChild( Gaffer.NameValuePlug( "", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "member1", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["PythonCommand"]["dispatcher"]["deadline"]["environmentVariables"].addChild( Gaffer.NameValuePlug( "", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "member2", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["PythonCommand"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["PythonCommand"]["dispatcher"]["deadline"]["pool"].setValue( 'none' )
__children["PythonCommand"]["dispatcher"]["deadline"]["secondaryPool"].setValue( 'none' )
__children["PythonCommand"]["dispatcher"]["deadline"]["group"].setValue( 'workstation' )
__children["PythonCommand"]["dispatcher"]["deadline"]["onJobComplete"].setValue( 'Nothing' )
__children["PythonCommand"]["dispatcher"]["deadline"]["dependencyMode"].setValue( 'Auto' )
__children["PythonCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member1"]["name"].setValue( 'GAFFER_EXTENSION_PATHS' )
__children["PythonCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member1"]["value"].setValue( 'P:\\\\GafferDeadline' )
__children["PythonCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member2"]["name"].setValue( 'DEADLINE_DEPENDENCY_SCRIPT_PATH' )
__children["PythonCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member2"]["value"].setValue( 'P:\\\\GafferDeadline\\\\gaffer_batch_dependency.py' )
__children["PythonCommand"]["command"].setValue( 'print("ERROR : hit the ejector button!")' )
__children["PythonCommand"]["__uiPosition"].setValue( imath.V2f( -4.64999962, 12.6499996 ) )

del __children

I'd like to keep handling text of Error : to fail the job for cases where a user wants to output that string in order to fail the job. From a PythonCommand node it would print that and exit with a return code of 0. But this way Deadline will flag it as an error.

For that script above, I get the following log which is what I expect:

=======================================================
Error
=======================================================
FailRenderException : ERROR : hit the ejector button!
   at Deadline.Plugins.DeadlinePlugin.FailRender(String message) (Python.Runtime.PythonException)
  File "C:\Users\eric.HYPOTHETICAL\AppData\Local\Thinkbox\Deadline10\slave\WORKSTATION07\plugins\61685d7ecd43bf6de4ca8629\Gaffer.py", line 185, in HandleGafferError
    self.FailRender(self.GetRegexMatch(0))
   at Python.Runtime.Dispatcher.Dispatch(ArrayList args)
   at __FranticX_Processes_ManagedProcess_StdoutHandlerDelegateDispatcher.Invoke()
   at FranticX.Processes.ManagedProcess.RegexHandlerCallback.CallFunction()
   at FranticX.Processes.ManagedProcess.e(String dg, Boolean dh)
   at FranticX.Processes.ManagedProcess.Execute(Boolean waitForExit)
   at Deadline.Plugins.DeadlinePlugin.DoRenderTasks()
   at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)
   at Deadline.Plugins.PluginWrapper.RenderTasks(Task task, String& outMessage, AbortLevel& abortLevel)

=======================================================
Type
=======================================================
RenderPluginException

=======================================================
Stack Trace
=======================================================
   at Deadline.Plugins.SandboxedPlugin.d(DeadlineMessage bcw, CancellationToken bcx)
   at Deadline.Plugins.SandboxedPlugin.RenderTask(Task task, CancellationToken cancellationToken)
   at Deadline.Slaves.SlaveRenderThread.c(TaskLogWriter aew, CancellationToken aex)

=======================================================
Log
=======================================================
2021-10-14 12:40:32:  0: Loading Job's Plugin timeout is Disabled
2021-10-14 12:40:33:  0: Executing plugin command of type 'Sync Files for Job'
2021-10-14 12:40:33:  0: Synchronization time for job files: 66.520 ms
2021-10-14 12:40:33:  0: Synchronizing Plugin Gaffer from \\queue.hypothetical.world\deadline_repository\custom\plugins\Gaffer took: 0 seconds
2021-10-14 12:40:33:  0: Done executing plugin command of type 'Sync Files for Job'
2021-10-14 12:40:34:  0: Executing plugin command of type 'Initialize Plugin'
2021-10-14 12:40:34:  0: INFO: Executing plugin script 'C:\Users\eric.HYPOTHETICAL\AppData\Local\Thinkbox\Deadline10\slave\WORKSTATION07\plugins\61685d7ecd43bf6de4ca8629\Gaffer.py'
2021-10-14 12:40:34:  0: INFO: About: Gaffer for Deadline
2021-10-14 12:40:34:  0: INFO: Render Job As User disabled, running as current user 'eric'
2021-10-14 12:40:34:  0: INFO: The job's environment will be merged with the current environment before rendering
2021-10-14 12:40:34:  0: Done executing plugin command of type 'Initialize Plugin'
2021-10-14 12:40:34:  0: Start Job timeout is disabled.
2021-10-14 12:40:34:  0: Task timeout is disabled.
2021-10-14 12:40:34:  0: Loaded job: PythonCommand (61685d7ecd43bf6de4ca8629)
2021-10-14 12:40:34:  0: Executing plugin command of type 'Start Job'
2021-10-14 12:40:34:  0: DEBUG: S3BackedCache Client is not installed.
2021-10-14 12:40:34:  0: INFO: Executing global asset transfer preload script 'C:\Users\eric.HYPOTHETICAL\AppData\Local\Thinkbox\Deadline10\slave\WORKSTATION07\plugins\61685d7ecd43bf6de4ca8629\GlobalAssetTransferPreLoad.py'
2021-10-14 12:40:34:  0: INFO: Looking for legacy (pre-10.0.26) AWS Portal File Transfer...
2021-10-14 12:40:34:  0: INFO: Looking for legacy (pre-10.0.26) File Transfer controller in C:/Program Files/Thinkbox/S3BackedCache/bin/task.py...
2021-10-14 12:40:34:  0: INFO: Could not find legacy (pre-10.0.26) AWS Portal File Transfer.
2021-10-14 12:40:34:  0: INFO: Legacy (pre-10.0.26) AWS Portal File Transfer is not installed on the system.
2021-10-14 12:40:34:  0: INFO: Executing global job preload script 'C:\Users\eric.HYPOTHETICAL\AppData\Local\Thinkbox\Deadline10\slave\WORKSTATION07\plugins\61685d7ecd43bf6de4ca8629\GlobalJobPreLoad.py'
2021-10-14 12:40:34:  0: INFO: Looking for AWS Portal File Transfer...
2021-10-14 12:40:34:  0: INFO: Looking for File Transfer controller in C:/Program Files/Thinkbox/S3BackedCache/bin/task.py...
2021-10-14 12:40:34:  0: INFO: Could not find AWS Portal File Transfer.
2021-10-14 12:40:34:  0: INFO: AWS Portal File Transfer is not installed on the system.
2021-10-14 12:40:34:  0: Done executing plugin command of type 'Start Job'
2021-10-14 12:40:34:  0: Plugin rendering frame(s): 1
2021-10-14 12:40:34:  0: Executing plugin command of type 'Render Task'
2021-10-14 12:40:34:  0: INFO: Performing path mapping
2021-10-14 12:40:34:  0: INFO: Stdout Redirection Enabled: True
2021-10-14 12:40:34:  0: INFO: Stdout Handling Enabled: True
2021-10-14 12:40:34:  0: INFO: Popup Handling Enabled: False
2021-10-14 12:40:34:  0: INFO: Using Process Tree: True
2021-10-14 12:40:34:  0: INFO: Hiding DOS Window: True
2021-10-14 12:40:34:  0: INFO: Creating New Console: False
2021-10-14 12:40:34:  0: INFO: Running as user: eric
2021-10-14 12:40:35:  0: INFO: Executable: "C:\gaffer\gaffer-0.60.7.0\bin\gaffer.bat"
2021-10-14 12:40:35:  0: INFO: Argument: execute -script "C:\Users\eric.HYPOTHETICAL\AppData\Local\Thinkbox\Deadline10\slave\WORKSTATION07\jobsData\61685d7ecd43bf6de4ca8629\thread0_tempH8wiT0\untitled.gfr" -nodes PythonCommand -frames 1-1 -context "-dispatcher:jobDirectory" "'C:\\Users\\eric.HYPOTHETICAL\\gaffer\\projects\\default\\dispatcher\\deadline\\000014'" "-dispatcher:scriptFileName" "'C:\\Users\\eric.HYPOTHETICAL\\gaffer\\projects\\default\\dispatcher\\deadline\\000014\\untitled.gfr'"
2021-10-14 12:40:35:  0: INFO: Full Command: "C:\gaffer\gaffer-0.60.7.0\bin\gaffer.bat" execute -script "C:\Users\eric.HYPOTHETICAL\AppData\Local\Thinkbox\Deadline10\slave\WORKSTATION07\jobsData\61685d7ecd43bf6de4ca8629\thread0_tempH8wiT0\untitled.gfr" -nodes PythonCommand -frames 1-1 -context "-dispatcher:jobDirectory" "'C:\\Users\\eric.HYPOTHETICAL\\gaffer\\projects\\default\\dispatcher\\deadline\\000014'" "-dispatcher:scriptFileName" "'C:\\Users\\eric.HYPOTHETICAL\\gaffer\\projects\\default\\dispatcher\\deadline\\000014\\untitled.gfr'"
2021-10-14 12:40:35:  0: INFO: Startup Directory: "C:\gaffer\gaffer-0.60.7.0\bin"
2021-10-14 12:40:35:  0: INFO: Process Priority: BelowNormal
2021-10-14 12:40:35:  0: INFO: Process Affinity: default
2021-10-14 12:40:35:  0: INFO: Process is now running
2021-10-14 12:40:35:  0: Done executing plugin command of type 'Render Task'

=======================================================
Details
=======================================================
Date: 10/14/2021 12:40:38
Frames: 1
Elapsed Time: 00:00:00:06
Job Submit Date: 10/14/2021 12:40:30
Job User: eric
Average RAM Usage: 20477632512 (8%)
Peak RAM Usage: 20525244416 (8%)
Average CPU Usage: 2%
Peak CPU Usage: 4%
Used CPU Clocks (x10^6 cycles): 8072
Total CPU Clocks (x10^6 cycles): 403558

If you can post the log where the information is missing I'll take a look and see if anything seems wrong. Or is it not sending the log to Deadline at all?

mrpurest commented 3 years ago

This is likely a quirk of our over complicated deadline setup.

we have rez loading our software, and tricks to do stuff with users, and other tricks around having windows and linux with different configs....

ericmehl commented 3 years ago

Sounds good. I'm going to close this for now, but if it's still creating problems or you think it's something on the GafferDeadline side, feel free to open it again of course.

ericmehl commented 2 years ago

Briefly reanimating this to note that I ran into the same problem and have a better solution for you now. The latest release should help you out: https://github.com/hypothetical-inc/GafferDeadline/releases/tag/0.56.0.0

The problem was Gaffer outputs error information across multiple lines, but the Deadline plugins process output one line at a time. So the first Gaffer output with Error : caused the task to fail and log the message correctly. But the message it logged isn't terribly helpful - it's the messages later on that give the real information.

I removed the Error : handling so only a non-zero exit code will fail a task now. It doesn't sound like you are, but if you're relying on being able to print ERROR : to the console, that will no longer work as expected. The best way is probably to raise RuntimeError("...") or assert. Both will cause Gaffer to exit and Deadline will log a useful error.