kubeedge / sedna

AI tookit over KubeEdge
https://sedna.readthedocs.io
Apache License 2.0
495 stars 159 forks source link

Add the multi-task joint inference approach for Sedna Lifelong Learning to support the perception of mobile agents #303

Open yunzhe99 opened 2 years ago

yunzhe99 commented 2 years ago

Why is this needed:

Lifelong Learning is an important feature for Sedna, which aims to make the inference robust across various scenarios, e.g., the thermal comfort prediction in different cities as is shown in the published example: Using Lifelong Learning Job in Thermal Comfort Prediction Scenario. In this example, Kubeedge-Sedna's open-source edge-cloud collaborative lifelong learning paradigm can process edge-coming inference tasks based on multiple historical tasks in the cloud-side knowledge base.

However, now the inference of lifelong learning is somehow a single task approach, which will first vote for the task suited for the sample, and then the model trained for this task will be used for inference. This approach works well when the model for the task is easy to train, e.g., Adaboost as is used in the example of thermal comfort prediction. However, for the perception of mobile agents, more complex models are usually used, e.g., Yolo as is used in the example: Using Joint Inference Service in Helmet Detection Scenario.

We have noticed that autonomous driving is one of the important applications of edge AI. Recently, how to cooperate with resources on the edge and cloud to provide support for autonomous driving applications has become an important topic. Autonomous driving has high requirements on edge AI inference performance. First of all, considering the characteristics of vehicle movement, the scenarios faced by autonomous vehicles are complicated and the applicable tasks are unknown, so the way of joint inference needs to be dynamically updated according to the task relationships. Secondly, automatic driving has high requirements for real-time performance, which requires us to trade-off between accuracy and latency. Thus, it is demanding for Sedna to support such a kind of application.

In the cases of the perception of autonomous driving, a number of factors will affect the performance of models trained for a task, and for some tasks, we have to use the sub-optimal models for inference, which will greatly influence the inference performance. It's well known that joint inference can help enhance the performance of perception, and this approach has been successfully applied in some projects, e.g., the helmet detection example.

Note that the task here refers to the tasks with different feature spaces rather than different label spaces. More details of the tasks here can be found in this paper.

What would you like to be added/modified:

I would like to add general support for multi-task joint inference for the Lifelong Learning feature. This feature will support edge devices represented by autonomous vehicles to locally complete neural network inference with as high accuracy as possible on the basis of the real-time requirements. This project will be studied based on the heterogeneous multi-task autonomous driving perception dataset such as BDD100k.

We should first try to integrate this feature into Sedna and then try to optimize the joint inference strategy for both accuracy and latency.

Other information: Recommended Skills: TensorFlow/Pytorch, Python Sedna lifelong-learning introduction Sedna guide Sedna lifelong-learning proposal How to contribute Sedna

shifan-Z commented 2 years ago

We will first use the data for multi-task training and then initialize the knowledge. Then, our tasks are attributed with the metadata. The attribution method could be the tree structure like a knowledge base. The tree will be realized with the decision model because of the complexity of the vision. After getting a knowledge base, we need to fuse the results of different models to get the ultimate result. We will explain how to fuse these multi-task inference in one application: Object Detection. For a Yolo-based model in Object Detection, we get some region proposals. These region proposals show the possible detection of objects. We just joint all the region proposals to get more region proposals than just using one model. Then, we use a post-procession method called non maximum suppression(NMS) to choose the most suitable bounding box for the object.