Closed cauldnz closed 6 years ago
Hello Chris, "Unexpected end of ..." is very confusing but harmless warning and can be safely ignored.
The problem in your case is "'/keras-resnet-horovod.py': [Errno 2] No such file or directory" and that means that $AZ_BATCHAI_INPUT_SCRIPTS/keras-resnet-horovod.py is expanded in /keras-resnet-horovod.py because $AZ_BATCHAI_INPUT_SCRIPTS is not defined. I suspect you have not specified input directory with id "SCRIPTS" in your job definition and that's why BatchAI has not setup AZ_BATCHAI_INPUT_SCRIPTS environment variable for your job.
Can you please provide the code which you used to populate input_directories variable?
Thanks, Alex
Hi Chris, Did my answer help?
Hi Alexander. Yes. All sorted thanks. I did still have some issues with path lengths when I used a longer Job name. I will try and repro but closing this issue. Thanks lots for your help.
This manifests in various ways, but, the most obvious is an issue when using
OpenMPI
with a multi-layered docker container.My job definition (Python) looks like this and running DSVM as the base image:
During execution I get something like this but I have had other situations (memory escapes me) where I have had to shorten things like the Job Name to keep the path lengths down.
Issue is documented here for OpenMPI.
In terms of suggested fix; I think the goal should be to minimize path lengths as much as possible.
Possible approaches... please add more thoughts:
Provide a
flatten_image
option onContainerSettings