Open easyfan327 opened 4 years ago
logs in worker for reference:
Traceback (most recent call last):
File "/root/rafiki/utils/service.py", line 50, in run_worker
start_worker(service_id, service_type, container_id)
File "scripts/start_worker.py", line 40, in start_worker
worker.start()
File "/root/rafiki/worker/train.py", line 56, in start
self._monitor.pull_job_info()
File "/root/rafiki/worker/train.py", line 257, in pull_job_info
self.model_class = load_model_class(model.model_file_bytes, model.model_class)
File "/root/rafiki/model/utils.py", line 51, in load_model_class
raise InvalidModelClassError(e)
rafiki.model.utils.InvalidModelClassError: Traceback (most recent call last):
File "/usr/local/envs/rafiki/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
hi @easyfan327, since rafiki has been upgraded to version 0.2.0, it is recommended that you install the most updated version of rafiki from nginyc/rafiki/master. Please remember to delete any old rafiki's instances (incl. docker images and containers) remaining on your machine before installing the new version. When scaling rafiki on GPU, also remember to add 'GPU_COUNT': 1 to budget while you create a train job (refer to latest doc: https://nginyc.github.io/rafiki/docs/0.2.0/src/python/rafiki.client.html#rafiki.client.Client.create_train_job). For example: client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='70efcbf6-b576-44d0-83b7-fd93e8ee03d3', val_dataset_id='9c28d97a-3d08-4903-b217-1169a13e5d6a', budget={ 'MODEL_TRIAL_COUNT': 5, 'GPU_COUNT': 1}, models=[ 'b67f3017-8f37-45cc-a7c5-a3f8912ac72e' ] ) I have no problem while running this model. FYR, attached herewith the code Please try again and let me know if there's any issues.
p.s. executed
bash scripts/setup_node.sh
to enable GPU support