neo-ai / neo-ai-dlr

Neo-AI-DLR is a common runtime for machine learning models compiled by AWS SageMaker Neo, TVM, or TreeLite.
Apache License 2.0
491 stars 106 forks source link

Error with running DLR on RPi #215

Open eddywart opened 4 years ago

eddywart commented 4 years ago

I am trying to perform machine learning on the edge using a sagemaker neo model as an AWS greengrass deployment package, as per the tutorial here: https://docs.aws.amazon.com/greengrass/latest/developerguide/ml-dlc-console.html

I installed the DLR package for raspberry pi model 3b+ using the pre-built wheel here: https://neo-ai-dlr.readthedocs.io/en/latest/install.html

While running the following set of code, it seems that the inference is successful (test-dlr.log), but the following error occurs: /home/pi/neo-ai-dlr/src/dlr_tvm.cc:71: No metadata found

#!/usr/bin/env python
import os
from dlr import DLRModel
import numpy as np
import time
import logging

logging.basicConfig(filename='test-dlr.log', level=logging.DEBUG)

current_milli_time = lambda: int(round(time.time() * 1000))

def run_inference():
    model_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'models/resnet50')
    device = 'cpu'
    model = DLRModel(model_path, device)

    synset_path = os.path.join(model_path, 'synset.txt')
    with open(synset_path, 'r') as f:
        synset = eval(f.read())

    image = np.load(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'dog.npy')).astype(np.float32)
    input_data = {'data': image}

    for rep in range(4):
        t1 = current_milli_time()
        out = model.run(input_data)
        t2 = current_milli_time()

        logging.debug('done m.run(), time (ms): {}'.format(t2 - t1))

        top1 = np.argmax(out[0])
        logging.debug('Inference result: {}, {}'.format(top1, synset[top1]))

    import resource
    logging.debug("peak memory usage (bytes on OS X, kilobytes on Linux) {}".format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))

    return {
        'synset_id': top1,
        'prediction': synset[top1],
        'time': t2 - t1
    }

if __name__ == '__main__':
    res = run_inference()
    cls_id = res['synset_id']
    exp_cls_id = 151
    assert cls_id == exp_cls_id, "Inference result class id {} is incorrect, expected class id is {}".format(cls_id, exp_cls_id)
    print("All tests PASSED!")

test-dlr.log

After deployment on a Lambda function through AWS greengrass, the same error is observed in the log file, but the inference did not successfully run (optimizedImageClassification.log).

optimizedImageClassification.log

What can I do to resolve this error?

trevor-m commented 4 years ago

Hi @eddywart

The first message you saw /home/pi/neo-ai-dlr/src/dlr_tvm.cc:71: No metadata found is just a warning and can be safely ignored.

Based on the errors you saw in your second log file, you are passing a directory that doesn't exist to DLRModel(). It appears /mlmodel directory doesn't exist. Is this something that greengrass is supposed to set up? Maybe this is a bug in green grass.

[2020-07-07T11:48:03.357+08:00][FATAL]-lambda_runtime.py:140,Failed to import handler function "inference.handler" due to exception: model_path /mlmodel doesn't exist
[2020-07-07T11:48:03.357+08:00][FATAL]-lambda_runtime.py:380,Failed to initialize Lambda runtime due to exception: model_path /mlmodel doesn't exist
[2020-07-07T11:48:04.56+08:00][ERROR]-__init__.py:1037,2020-07-07 11:48:04,437 ERROR error in DLRModel instantiation model_path /mlmodel doesn't exist
[2020-07-07T11:48:04.56+08:00][ERROR]-Traceback (most recent call last):
[2020-07-07T11:48:04.56+08:00][ERROR]-  File "/usr/lib/python3/dist-packages/dlr/api.py", line 82, in __init__
[2020-07-07T11:48:04.56+08:00][ERROR]-    self._impl = DLRModelImpl(model_path, dev_type, dev_id)
[2020-07-07T11:48:04.56+08:00][ERROR]-  File "/usr/lib/python3/dist-packages/dlr/dlr_model.py", line 101, in __init__
[2020-07-07T11:48:04.56+08:00][ERROR]-    raise ValueError("model_path %s doesn't exist" % model_path)
[2020-07-07T11:48:04.56+08:00][ERROR]-ValueError: model_path /mlmodel doesn't exist