Closed Richard14916 closed 3 years ago
A change to the try except so that if the failure message contains CUDA (indicating e.g. out of resources errors) it will fail with sys.exit(62). This could then be integrated into a hold / restart dag setup, or runmon restart monitoring
A change to the try except so that if the failure message contains CUDA (indicating e.g. out of resources errors) it will fail with sys.exit(62). This could then be integrated into a hold / restart dag setup, or runmon restart monitoring