20200329 주간 진행 공유

https://drive.google.com/drive/u/1/folders/1a4bQ_FDFSwk8KzEgk8oK3eMnqfmvEycF

< 논문 구조 >

1. Intro

Background

There are many Deep Learning research related to input data representations and various representations are proposed, such as PointNET-3D[2], VoxelNET-Voxel[3] and Multi-View-2D[5]. and RangeNet++[6]. Especially, RangeNet++ propose a model input using spherical projection and k-nearest neighbor(kNN) to raise accuracy and speed on a GPU environment. Our work will modify RangeNet++ model to apply on on-device environment. We will use bilinear interpolation after spherical projection and use simple computation instead of kNN for better performance on mobile devices
Motivation 필요성, 왜?
Contribution Preprocessor 강조

2. Main(Approach Proposal)

2.1 전체 method 설명

NNStreamer 오픈 소스를 활용한 파이프라인 구성
2D preprocessor 기법

2.1.1 NNStreamer

A software system, which is called NNStreamer, taking role as filters of stream pipelines, deals with neural networks. It can handle various kinds of sensor data, supports real-time processing, and has the advantage of processing speed, making it a good platform for on-device environment. Taking advantage of this platform, we could handle the complex data pipelines like Figure3, which makes higher performance and user friendly environment. First of all, we have an experiment that uses input data inserted to ‘Video convert’ and then, learn SequeezeNet model.

2.2 Preprocessor

2.2.1 2D Representation

When we project the data onto the sphere, we can reduce the size of the data and also increases the density of the sparse point cloud data. We will test these merits by using the KITTY dataset. The LIDAR that was used to create the KITTY dataset is a 64ch’s HDL-64E LIDAR and sensors 360 degrees. The data frame’s size used for 2D projection can be expressed in H x W x C form. H is 64(the number of channels), and W is 640(160 degrees front divided by 640), and C is 1 (C is the number of features, and we decided to use only one feature, distance. We calculate C by using the eq. (1) below.)

2.2.2 Interpolation

Spherical projection output does not create frames with equal angle when converting from cartesian coordinate to angle coordinate. This arises problems as shown in Figure 2. This can be solved by transformation and bilinear interpolation. Transformation and interpolation will make the data frame’s spacing even. And Figure 2’s empty space will also be filled by using kNN or RANSAC algorithms used in RangeNet++[6]. For on-device environment, we will also test if simple computation can replace kNN or RANSAC for filling out the empty space and improve the computation speed.

3. Experiment results

3.1 실험 방법

mobilenetv2 AP/FPS 평가

3.2 결과 및 비교
preprocessor 간 비교
model 교체 시 비교(시간이 가능하다면)

4. Conclusions

논문 작업 시, 참고 Review

It lacks some basic background information (e.g., SequeezeNet model)
With dataset reduction and using only one feature, it is also not clear how the accuracy rate can be improved.
Furthermore, this paper would be much stronger with some preliminary evaluation that answers how the proposed methodology improves the performance of the object detection algorithm.
It seems little novelty in the downsizing techniques (maybe due to the unclear explanations).
No actual evaluation (only expectation) is mentioned to show the effectiveness of the downsizing and its impact on machine learning accuracy.
It'd be more interesting to discuss what may be the challenges in downsizing the KITTI and other realistic datasets using the techniques listed in the paper and why the techniques may be the most suitable candidate techniques.

nnstreamer-preprocessor / nnstreamer