I am try to use the solution proposed within my project with a custom training script. I have this two problems:
child runs fail with UserError (my fault, of course) but I cannot find the related error message in logs
when all child runs fail the run (father) status sometimes is FAIL, sometimes is COMPLETED, sometimes is RUNNING
I acknowledge these seem not issues with the solution proposed, still I think that may hamper the robustness of it.
Please let me know if you can help me.
Thank you a lot!
UPDATE
Modifying the except part of the custom train script as
I am try to use the solution proposed within my project with a custom training script. I have this two problems:
I acknowledge these seem not issues with the solution proposed, still I think that may hamper the robustness of it. Please let me know if you can help me. Thank you a lot!
UPDATE Modifying the except part of the custom train script as
was a first good idea!