Closed Told closed 4 years ago
Thanks for giving Nexus a try!
From the screenshot, I found a few possible reasons why this happens:
ls $MODEL_DIR/profiles/
and see if the profile is actually inside the directory or subdirectories. Also a reminder that the profile is per GPU card, not per GPU kind. I see you have two GPUs, perhaps you just profiled one?Thanks for giving Nexus a try!
From the screenshot, I found a few possible reasons why this happens:
- Did you start the frontend before the backends? Frontends need to be started after backends are ready, otherwise it'll complain about not enough backends.
- The scheduler log shows that it can't find the profile. Have you profiled the model on the GPU? If you did profile, try
ls $MODEL_DIR/profiles/
and see if the profile is actually inside the directory or subdirectories. Also a reminder that the profile is per GPU card, not per GPU kind. I see you have two GPUs, perhaps you just profiled one?
hi @abcdabcd987 , as you said, I deployed Nexus again. First, I started two profiler process on GPU 0 and 1, ls s $MODEL_DIR/profiles/ likes: then I also deployed two backend process on GPU0 and GPU 1, and they had already registered to the scheduler: But, finally I started frontend process, I still got the same error said NOT_ENOUGH_BACKENDS.:
I just realized that the instruction here is slightly outdated. You would also need to specify the image width and height when starting the front end. Like:
docker run -it --rm --gpus all --network=nexus-net --name=nexus-simple-frontend -p=9001 -p=9002 abcdabcd987/nexus \
/nexus/build/simple -framework=tensorflow -model=resnet_0 -latency=50 -width=224 -height=224 -alsologtostderr -colorlogtostderr \
-sch_addr=nexus-scheduler:10001
I just realized that the instruction here is slightly outdated. You would also need to specify the image width and height when starting the front end. Like:
docker run -it --rm --gpus all --network=nexus-net --name=nexus-simple-frontend -p=9001 -p=9002 abcdabcd987/nexus \ /nexus/build/simple -framework=tensorflow -model=resnet_0 -latency=50 -width=224 -height=224 -alsologtostderr -colorlogtostderr \ -sch_addr=nexus-scheduler:10001
IT WORKS! Thanks. please update the example readme.
Hi,abcdabcd987! I follow the steps from https://github.com/uwsampl/nexus/blob/master/examples/README.md. Then I got frontend and scheduler error. frontend error log: Load model error: NOT_ENOUGH_BACKENDS scheduler error log: failed to connect to all addresses
backend runs normal
what did I do wrong?