oshaughn / research-projects-RIT

Clean version of research-projects, just with ILE
MIT License
14 stars 14 forks source link

added hard fail on CUDA errors with catchable code #52

Closed Richard14916 closed 3 years ago

Richard14916 commented 3 years ago

A change to the try except so that if the failure message contains CUDA (indicating e.g. out of resources errors) it will fail with sys.exit(62). This could then be integrated into a hold / restart dag setup, or runmon restart monitoring