tnc-ca-geo / animl-ml

Machine Learning resources for camera trap data processing
Other
4 stars 1 forks source link

Megadetector TfServing Model shows node error and device error #70

Closed rbavery closed 2 years ago

rbavery commented 2 years ago

I'm getting two cryptic errors when trying to serve a tfserving model for megadetector.

I went ahead and copied the model checkpoint and tfserving .pb model file to the TFServing directory format like we discussed.

export-tfserve
└── Servo
    └── 1
        ├── saved_model.pb
        └── variables
            ├── checkpoint
            ├── variables.data-00000-of-00001
            ├── variables.index
            └── variables.meta

Files are hosted here: https://github.com/microsoft/CameraTraps/blob/main/megadetector.md#download-links

And I’m testing the TFServing container that was built by Sagemaker which I cloned from ECR (found this from the deploy megadetector notebook hosted on the megadetector Sagemaker Notebook Instance):

docker run -t --rm -p 8501:8501 -v "/Users/rave/animl/animl-ml/models/export-tfserve/Servo/:/models/model" 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:1.15-cpu &

Finally I renamed files in the downloaded model checkpoint directory from model.ckpt to variables

# rave at Ryans-MacBook-Pro.local in ~/animl/animl-ml/models on git:sagemaker-inf-recommender ✖︎ [11:14:12]
→ ls export-tfserve/Servo/1/variables/          
checkpoint                    variables.index
variables.data-00000-of-00001 variables.meta 

After starting the service with all the above prep, I’m getting a new error :

# rave at Ryans-MacBook-Pro.local in ~/animl/animl-ml/models on git:sagemaker-inf-recommender ✖︎ [11:08:40]
→ 2022-06-06 18:08:40.860258: I tensorflow_serving/model_servers/server.cc:85] Building single TensorFlow model file config:  model_name: model model_base_path: /models/model
2022-06-06 18:08:40.861217: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2022-06-06 18:08:40.861297: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: model
2022-06-06 18:08:40.974888: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 1}
2022-06-06 18:08:40.975050: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 1}
2022-06-06 18:08:40.975084: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 1}
2022-06-06 18:08:40.975228: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/model/1
2022-06-06 18:08:40.975537: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/model/1
2022-06-06 18:08:42.633135: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2022-06-06 18:08:42.706175: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-06-06 18:08:42.920667: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2022-06-06 18:08:43.235152: E 

#### new error
external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:533] model_pruner failed: Internal: Could not find node with name ''
2022-06-06 18:08:51.517502: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:533] model_pruner
failed: Internal: Could not find node with name ''
2022-06-06 18:08:54.103147: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: fail. Took 13120646 microseconds.
2022-06-06 18:08:54.105839: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: model version: 1} failed: Invalid argument: Tensor :0, specified in either feed_devices or fetch_devices was not found in the Graph
2022-06-06 18:09:54.017491: I tensorflow_serving/util/retrier.cc:33] Retrying of Loading servable: {name: model version: 1} retry: 1
rbavery commented 2 years ago

@ingalls pointed out that the metadata endpoint describes what the model expects, and I think the issue is I was supplying a base64 encoded image but the model doesn't expect this

# rave at rave-desktop in ~/animl/animl-ml on git:fix-requirement-mac ✖︎ [17:26:08]
→ GET http://localhost:8501/v1/models/model/metadata
{
"model_spec":{
 "name": "model",
 "signature_name": "",
 "version": "1"
}
,
"metadata": {"signature_def": {
 "signature_def": {
  "serving_default": {
   "inputs": {
    "inputs": {
     "dtype": "DT_UINT8",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "3",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "image_tensor:0"
    }
   },
   "outputs": {
    "detection_classes": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "100",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "detection_classes:0"
    },
    "num_detections": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "num_detections:0"
    },
    "detection_boxes": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "100",
        "name": ""
       },
       {
        "size": "4",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "detection_boxes:0"
    },
    "detection_scores": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "100",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "detection_scores:0"
    }
   },
   "method_name": "tensorflow/serving/predict"
  }
 }
}
}
}

this is an example of how to encode the image that was used in ml enabler: pasting for reference later: https://github.com/developmentseed/ml-enabler/blob/master/lambda/download_and_predict/tensorflow.py

closing since we are switching to MDv5