Open chandan-labelfuse opened 1 year ago
@chandan-labelfuse This seems like a mismatch between your model and weights. Did you try loading the weights into your model with standalone PyTorch
@agunapal I have loaded the weights using this script and downloaded the same weights to compile in torch-model-archiver.
from torchvision.models.detection import ssdlite320_mobilenet_v3_large
from torchvision.models.detection import SSDLite320_MobileNet_V3_Large_Weights
weights = SSDLite320_MobileNet_V3_Large_Weights.DEFAULT
model = ssdlite320_mobilenet_v3_large(weights=weights)
I am skeptical about my definition of SSDLite in model.py. Could one of you verify the implementation? Comparted to FasterCNN docs, SSD is a bit more complicated since it requires more parameters to be passed.
Will get back to you
🐛 Describe the bug
I am trying to serve ssdlite320_mobilenet_v3_large model using torchserve. I created the custom model.py file, downloaded the weights and ran torch-model-archiver. Yet it gives the error of not loading the model properly
Error logs
2022-12-21T16:41:30,460 [INFO ] W-9000-ssdlitemobilenet_1.0-stdout MODEL_LOG - raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2022-12-21T16:41:30,460 [INFO ] W-9000-ssdlitemobilenet_1.0-stdout MODEL_LOG - RuntimeError: Error(s) in loading state_dict for SSDLiteObjectDetector: 2022-12-21T16:41:30,459 [INFO ] epollEventLoopGroup-5-8 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED 2022-12-21T16:41:30,461 [DEBUG] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 2022-12-21T16:41:30,461 [DEBUG] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?] at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:191) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-12-21T16:41:30,462 [WARN ] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: ssdlitemobilenet, error: Worker died. 2022-12-21T16:41:30,462 [DEBUG] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-ssdlitemobilenet_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2022-12-21T16:41:30,462 [WARN ] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-ssdlitemobilenet_1.0-stderr 2022-12-21T16:41:30,463 [WARN ] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-ssdlitemobilenet_1.0-stdout 2022-12-21T16:41:30,463 [INFO ] W-9000-ssdlitemobilenet_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 21 seconds. 2022-12-21T16:41:30,462 [INFO ] W-9000-ssdlitemobilenet_1.0-stdout MODEL_LOG - Missing key(s) in state_dict: "backbone.body.0.0.weight", "backbone.body.0.1.weight", "backbone.body.0.1.bias", "backbone.body.0.1.running_mean", "backbone.body.0.1.running_var", "backbone.body.1.block.0.0.weight", "backbone.body.1.block.0.1.weight", "backbone.body.1.block.0.1.bias", "backbone.body.1.block.0.1.running_mean", "backbone.body.1.block.0.1.running_var", "backbone.body.1.block.1.0.weight", "backbone.body.1.block.1.1.weight", "backbone.body.1.block.1.1.bias", "backbone.body.1.block.1.1.running_mean", "backbone.body.1.block.1.1.running_var", "backbone.body.2.block.0.0.weight", "backbone.body.2.block.0.1.weight", "backbone.body.2.block.0.1.bias", "backbone.body.2.block.0.1.running_mean", "backbone.body.2.block.0.1.running_var", "backbone.body.2.block.1.0.weight", "backbone.body.2.block.1.1.weight", "backbone.body.2.block.1.1.bias", "backbone.body.2.block.1.1.running_mean", "backbone.body.2.block.1.1.running_var", "backbone.body.2.block.2.0.weight", "backbone.body.2.block.2.1.weight", "backbone.body.2.block.2.1.bias", "backbone.body.2.block.2.1.running_mean", "backbone.body.2.block.2.1.running_var", "backbone.body.3.block.0.0.weight", "backbone.body.3.block.0.1.weight", "backbone.body.3.block.0.1.bias", "backbone.body.3.block.0.1.running_mean", "backbone.body.3.block.0.1.running_var", "backbone.body.3.block.1.0.weight", "backbone.body.3.block.1.1.weight", "backbone.body.3.block.1.1.bias", "backbone.body.3.block.1.1.running_mean", "backbone.body.3.block.1.1.running_var", "backbone.body.3.block.2.0.weight", "backbone.body.3.block.2.1.weight", "backbone.body.3.block.2.1.bias", "backbone.body.3.block.2.1.running_mean", "backbone.body.3.block.2.1.running_var", "backbone.body.4.block.0.0.weight", "backbone.body.4.block.0.1.weight", "backbone.body.4.block.0.1.bias", "backbone.body.4.block.0.1.running_mean", "backbone.body.4.block.0.1.running_var", "backbone.body.4.block.1.0.weight", "backbone.body.4.block.1.1.weight", "backbone.body.4.block.1.1.bias", "backbone.body.4.block.1.1.running_mean", "backbone.body.4.block.1.1.running_var", "backbone.body.4.block.2.fc1.weight", "backbone.body.4.block.2.fc1.bias", "backbone.body.4.block.2.fc2.weight", "backbone.body.4.block.2.fc2.bias", "backbone.body.4.block.3.0.weight", "backbone.body.4.block.3.1.weight", "backbone.body.4.block.3.1.bias", "backbone.body.4.block.3.1.running_mean", "backbone.body.4.block.3.1.running_var", "backbone.body.5.block.0.0.weight", "backbone.body.5.block.0.1.weight", "backbone.body.5.block.0.1.bias", "backbone.body.5.block.0.1.running_mean", "backbone.body.5.block.0.1.running_var", "backbone.body.5.block.1.0.weight", "backbone.body.5.block.1.1.weight", "backbone.body.5.block.1.1.bias", "backbone.body.5.block.1.1.running_mean", "backbone.body.5.block.1.1.running_var", "backbone.body.5.block.2.fc1.weight", "backbone.body.5.block.2.fc1.bias", "backbone.body.5.block.2.fc2.weight", "backbone.body.5.block.2.fc2.bias", "backbone.body.5.block.3.0.weight", "backbone.body.5.block.3.1.weight", "backbone.body.5.block.3.1.bias", "backbone.body.5.block.3.1.running_mean", "backbone.body.5.block.3.1.running_var", "backbone.body.6.block.0.0.weight", "backbone.body.6.block.0.1.weight", "backbone.body.6.block.0.1.bias", "backbone.body.6.block.0.1.running_mean", "backbone.body.6.block.0.1.running_var", "backbone.body.6.block.1.0.weight", "backbone.body.6.block.1.1.weight", "backbone.body.6.block.1.1.bias", "backbone.body.6.block.1.1.running_mean", "backbone.body.6.block.1.1.running_var", "backbone.body.6.block.2.fc1.weight", "backbone.body.6.block.2.fc1.bias", "backbone.body.6.block.2.fc2.weight", "backbone.body.6.block.2.fc2.bias", "backbone.body.6.block.3.0.weight", "backbone.body.6.block.3.1.weight", "backbone.body.6.block.3.1.bias", "backbone.body.6.block.3.1.running_mean", "backbone.body.6.block.3.1.running_var", "backbone.body.7.block.0.0.weight", "backbone.body.7.block.0.1.weight", "backbone.body.7.block.0.1.bias", "backbone.body.7.block.0.1.running_mean", "backbone.body.7.block.0.1.running_var", "backbone.body.7.block.1.0.weight", "backbone.body.7.block.1.1.weight", "backbone.body.7.block.1.1.bias", "backbone.body.7.block.1.1.running_mean", "backbone.body.7.block.1.1.running_var", "backbone.body.7.block.2.0.weight", "backbone.body.7.block.2.1.weight", "backbone.body.7.block.2.1.bias", "backbone.body.7.block.2.1.running_mean", "backbone.body.7.block.2.1.running_var", "backbone.body.8.block.0.0.weight", "backbone.body.8.block.0.1.weight", "backbone.body.8.block.0.1.bias", "backbone.body.8.block.0.1.running_mean", "backbone.body.8.block.0.1.running_var", "backbone.body.8.block.1.0.weight", "backbone.body.8.block.1.1.weight", "backbone.body.8.block.1.1.bias", "backbone.body.8.block.1.1.running_mean", "backbone.body.8.block.1.1.running_var", "backbone.body.8.block.2.0.weight", "backbone.body.8.block.2.1.weight", "backbone.body.8.block.2.1.bias", "backbone.body.8.block.2.1.running_mean", "backbone.body.8.block.2.1.running_var", "backbone.body.9.block.0.0.weight", "backbone.body.9.block.0.1.weight", "backbone.body.9.block.0.1.bias", "backbone.body.9.block.0.1.running_mean", "backbone.body.9.block.0.1.running_var", "backbone.body.9.block.1.0.weight", "backbone.body.9.block.1.1.weight", "backbone.body.9.block.1.1.bias", "backbone.body.9.block.1.1.running_mean", "backbone.body.9.block.1.1.running_var", "backbone.body.9.block.2.0.weight", "backbone.body.9.block.2.1.weight", "backbone.body.9.block.2.1.bias", "backbone.body.9.block.2.1.running_mean", "backbone.body.9.block.2.1.running_var", "backbone.body.10.block.0.0.weight", "backbone.body.10.block.0.1.weight", "backbone.body.10.block.0.1.bias", "backbone.body.10.block.0.1.running_mean", "backbone.body.10.block.0.1.running_var", "backbone.body.10.block.1.0.weight", "backbone.body.10.block.1.1.weight", "backbone.body.10.block.1.1.bias", "backbone.body.10.block.1.1.running_mean", "backbone.body.10.block.1.1.running_var", "backbone.body.10.block.2.0.weight", "backbone.body.10.block.2.1.weight", "backbone.body.10.block.2.1.bias", "backbone.body.10.block.2.1.running_mean", "backbone.body.10.block.2.1.running_var", "backbone.body.11.block.0.0.weight", "backbone.body.11.block.0.1.weight", "backbone.body.11.block.0.1.bias", "backbone.body.11.block.0.1.running_mean", "backbone.body.11.block.0.1.running_var", "backbone.body.11.block.1.0.weight", "backbone.body.11.block.1.1.weight", "backbone.body.11.block.1.1.bias", "backbone.body.11.block.1.1.running_mean", "backbone.body.11.block.1.1.running_var", "backbone.body.11.block.2.fc1.weight", "backbone.body.11.block.2.fc1.bias", "backbone.body.11.block.2.fc2.weight", "backbone.body.11.block.2.fc2.bias", "backbone.body.11.block.3.0.weight", "backbone.body.11.block.3.1.weight", "backbone.body.11.block.3.1.bias", "backbone.body.11.block.3.1.running_mean", "backbone.body.11.block.3.1.running_var", "backbone.body.12.block.0.0.weight", "backbone.body.12.block.0.1.weight", "backbone.body.12.block.0.1.bias", "backbone.body.12.block.0.1.running_mean", "backbone.body.12.block.0.1.running_var", "backbone.body.12.block.1.0.weight", "backbone.body.12.block.1.1.weight", "backbone.body.12.block.1.1.bias", "backbone.body.12.block.1.1.running_mean", "backbone.body.12.block.1.1.running_var", "backbone.body.12.block.2.fc1.weight", "backbone.body.12.block.2.fc1.bias", "backbone.body.12.block.2.fc2.weight", "backbone.body.12.block.2.fc2.bias", "backbone.body.12.block.3.0.weight", "backbone.body.12.block.3.1.weight", "backbone.body.12.block.3.1.bias", "backbone.body.12.block.3.1.running_mean", "backbone.body.12.block.3.1.running_var", "backbone.body.13.block.0.0.weight", "backbone.body.13.block.0.1.weight", "backbone.body.13.block.0.1.bias", "backbone.body.13.block.0.1.running_mean", "backbone.body.13.block.0.1.running_var", "backbone.body.13.block.1.0.weight", "backbone.body.13.block.1.1.weight", "backbone.body.13.block.1.1.bias", "backbone.body.13.block.1.1.running_mean", "backbone.body.13.block.1.1.running_var", "backbone.body.13.block.2.fc1.weight", "backbone.body.13.block.2.fc1.bias", "backbone.body.13.block.2.fc2.weight", "backbone.body.13.block.2.fc2.bias", "backbone.body.13.block.3.0.weight", "backbone.body.13.block.3.1.weight", "backbone.body.13.block.3.1.bias", "backbone.body.13.block.3.1.running_mean", "backbone.body.13.block.3.1.running_var", "backbone.body.14.block.0.0.weight", "backbone.body.14.block.0.1.weight", "backbone.body.14.block.0.1.bias", "backbone.body.14.block.0.1.running_mean", "backbone.body.14.block.0.1.running_var", "backbone.body.14.block.1.0.weight", "backbone.body.14.block.1.1.weight", "backbone.body.14.block.1.1.bias", "backbone.body.14.block.1.1.running_mean", "backbone.body.14.block.1.1.running_var", "backbone.body.14.block.2.fc1.weight", "backbone.body.14.block.2.fc1.bias", "backbone.body.14.block.2.fc2.weight", "backbone.body.14.block.2.fc2.bias", "backbone.body.14.block.3.0.weight", "backbone.body.14.block.3.1.weight", "backbone.body.14.block.3.1.bias", "backbone.body.14.block.3.1.running_mean", "backbone.body.14.block.3.1.running_var", "backbone.body.15.block.0.0.weight", "backbone.body.15.block.0.1.weight", "backbone.body.15.block.0.1.bias", "backbone.body.15.block.0.1.running_mean", "backbone.body.15.block.0.1.running_var", "backbone.body.15.block.1.0.weight", "backbone.body.15.block.1.1.weight", "backbone.body.15.block.1.1.bias", "backbone.body.15.block.1.1.running_mean", "backbone.body.15.block.1.1.running_var", "backbone.body.15.block.2.fc1.weight", "backbone.body.15.block.2.fc1.bias", "backbone.body.15.block.2.fc2.weight", "backbone.body.15.block.2.fc2.bias", "backbone.body.15.block.3.0.weight", "backbone.body.15.block.3.1.weight", "backbone.body.15.block.3.1.bias", "backbone.body.15.block.3.1.running_mean", "backbone.body.15.block.3.1.running_var", "backbone.body.16.0.weight", "backbone.body.16.1.weight", "backbone.body.16.1.bias", "backbone.body.16.1.running_mean", "backbone.body.16.1.running_var", "backbone.fpn.inner_blocks.0.0.weight", "backbone.fpn.inner_blocks.0.0.bias", "backbone.fpn.inner_blocks.1.0.weight", "backbone.fpn.inner_blocks.1.0.bias", "backbone.fpn.layer_blocks.0.0.weight", "backbone.fpn.layer_blocks.0.0.bias", "backbone.fpn.layer_blocks.1.0.weight", "backbone.fpn.layer_blocks.1.0.bias".
Installation instructions
Installed Torchserve using,
python ./ts_scripts/install_dependencies.py --cuda=cu114 pip install torchserve torch-model-archiver torch-workflow-archiver
Model Packaing
Built a model.py for SSDLite using https://github.com/pytorch/serve/blob/master/examples/object_detector/fast-rcnn/model.py as reference
config.properties
None
Versions
Environment headers
Torchserve branch:
torchserve==0.7.0 torch-model-archiver==0.7.0
Python version: 3.9 (64-bit runtime) Python executable: /home/chandan/anaconda3/envs/torch-stream/bin/python
Versions of relevant python libraries: captum==0.5.0 future==0.18.2 numpy==1.24.0 nvgpu==0.9.0 psutil==5.9.4 requests==2.28.1 torch==1.13.1 torch-model-archiver==0.7.0 torch-workflow-archiver==0.2.6 torchaudio==0.13.1 torchserve==0.7.0 torchvision==0.14.1 wheel==0.38.4 torch==1.13.1 **Warning: torchtext not present .. torchvision==0.14.1 torchaudio==0.13.1
Java Version:
OS: Ubuntu 18.04.6 LTS GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: N/A CMake version: version 3.10.2
Is CUDA available: Yes CUDA runtime version: N/A GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1050 Nvidia driver version: 470.161.03 cuDNN version: None
Repro instructions
Downloaded the weights of the model from https://download.pytorch.org/models/ssdlite320_mobilenet_v3_large_coco-a79551df.pth.
Created the .mar file using the model.py file given above, copied the created .mar file to model_store
Ran the server with
Possible Solution
I am not sure why the weights aren't compatible with the SSDlite architecture. My guess is that I am not building the model skeleton right in model.py. The docs don't specify creating handlers for different object detection models except for FastRCNN which I used as a reference. Any help is really appreciated to correct this issue.