kwea123 / kitti_bev_detection

4 stars 2 forks source link

Question about how to run this model #1

Open gujiaqivadin opened 5 years ago

gujiaqivadin commented 5 years ago

Hello, Kewa123! I am a student from China, and I choose this bev version of frustum pointnet paper as my undergraduate's graduation project. And I found that you have realized the whole idea, but your settings are a little different with my plans. So I want to realize your project first before I work on mines. (ex. you use keras to train, but I need to use the basic tensorflow language) Therefore, I have some questions about how to run this model, and what is the utility of each code. Could you please have a short introduction for me?

Thanks a lot!

kwea123 commented 5 years ago

As I described in the original issue, the bev detection works as follows:

  1. Do a 2D detection on point cloud.
  2. Extract the region of the bounding boxes and use f-pointnet to regress the 3D box.

So let me describe each part with more details.

1. Do a 2D detection on point cloud

This step consists of two sub-steps : transform the points to image and perform 2d detection on the image.

  1. Transform the points to image Take a look at my script, which uses point_cloud_2_top function to transform the points to an image. You can set different point ranges and different resolutions according to your needs. Using mine as the first attempt might be a good idea. Then, you need to generate the ground truth bounding boxes accordingly using transform_to_img function. The last image shows the transfomed points with bounding boxes. If you are succesful to this point, you can go next.
  2. Train a 2D detector on this detection task Now you have an image and ground truth bounding boxes, this task is not any different from any other image detection task. You can simply choose any detection models on internet to do this. I said I use keras, but this is not restrictive at all! You can choose any model you like, for example the tensorflow object detection, DIGITS, or whatever.

After training this model, now given an arbitratry point cloud, you should be able to locate the cars' bev location by 1. transform the points to image, 2. do the detection on the image. Until now, they are just still 2d boxes. Next I will describe how to get the 3d boxes.

OK maybe next time. Make sure you make the above things work first. If you have any question, don't hesitate to comment below. Also, you can comment in Chinese if you feel that you can describe things better.

Cheers,

gujiaqivadin commented 5 years ago

Hello,kwea113! In last 2 days, I tried to finish 1. and 2. I already finish the 1st step, and get a set of images and labels. Because Keras is not very well set in our labs, I chose a familiar one -- faster r-cnn to realize this idea. But labels I get from visualization.ipynb are different from normal kitti labels, and a lot of faster-rcnn code uses PASCAL VOC dataset which has a lot of difference with the data we just produced. I know a transform code is needed and I am working on it now. Could you please tell me do you have the same problem before? And further, what you do in the 3d pointnet model. It will be a lot of support to me.

Thanks a lot!

kwea123 commented 5 years ago

What do you mean "labels are different" and "a lot of difference"? As far as I understand, you only need the image, bounding boxes (4 numbers indicating each border) and the classes to train the 2d detector, so what is generated in visualization.ipynb should be enough. What else are you missing?

Next, for the pointnet model, you can train a bev model by using prepare_data_bev.py to generate the training data (add this code to the original prepare_data.py in f-pointnet) or just use the original one trained on frustum data (it performs still quite well).

After you trained both the 2d bev detector and the bev pointnet, you can do the inference by

2. Extract the region of the bounding boxes and use f-pointnet to regress the 3D box.

First you get the bounding boxes on the bev image. Suppose one of them is (x1, y1, x2, y2). You need to first extract points in this region using simple numpy operation. Then you just need to follow the pipeline of f-pointnet:

  1. transform these points into camera coordinate.
  2. centralize them by rotating by frustum_angle+np.pi/2. The frustum angle can be computed by -1 * np.arctan2((x1+x2)/2.0, -(y1+y1)/2.0).
  3. sample points then pass to bev-pointnet model to regress a 3d box.

And that's everything! Hope you understand and be able to implement this. It is not that difficult.

gujiaqivadin commented 5 years ago

Hello,kwea123! 上次说起有一些区别主要是因为我训练的是faster r-cnn网络,在目前所给的一些代码里,它的数据集格式基本上是基于Pascal和COCO的,因此其图片格式需要做一些转化(jpg->xml)和(csv->txt),同时由于数据不同(我们只有box的4个角点的位置),需要在原代码里也做一些修改。目前我还在看这部分的适配的代码。不知道之前你有用过比较友好的faster R-CNN代码吗~ 因为作为本科生毕设基本上算是第一次接触这方面的代码和数据,还在摸索阶段,但是实现faster r-cnn和后面的训练pointnet网络个人感觉确实不是一件很难的事情,只是新上手确实有些生疏肯定会遇到这样那样的问题。有什么问题我们之后在做交流,非常感谢分享你的想法,因此这恰好是我在分析后选定的毕设的想法,因此对我的上手和帮助也很大!感激!

kwea123 commented 5 years ago

我沒有自己寫過,只有用過別人寫好的,基本上只要準備好資料集就可以訓練了。像是我上面提到的 tensorflow object detection,這個使用圖片(jpg或png)和4個角落就可以訓練。

gujiaqivadin commented 5 years ago

Hello,kwea123! 过年完以后我重新开始了这个项目,目前通过faster RCNN训练出了一个网络来实现鸟瞰图中的车辆检测,发现通过网络检测出来的Box都是没有角度的,请问一下这一步是不是在之后的Pointnet中完成。

kwea123 commented 5 years ago

是的,你先訓練一個通過bev proposal來檢測3d box的pointnet,然後再套用到faster rcnn所偵測出來的區域即可。是我上面所說的第2.點

gujiaqivadin commented 5 years ago

Hello,kwea123! 我比较了一下与mv3d和avod和该项目在切割bev map的大小上的区别,而这两篇论文都是kitti内点云的全体大小-40~40,0~70,发现您在这里设成大小为-20~20,0~40,等于是做了一个切割。您在有没有想过设成-40~40,0~70呢,或者说为什么会考虑-20~20,0~40。这是我最近比较好奇的一点。

gujiaqivadin commented 5 years ago

Hello,kwea123! 以及我发现了一个问题,在完成Pointnet的回归后,输出的是有角度的box,这时候我们如何去evaluate整个网络的性能(例如easy,moderate,hard类)呢,毕竟之前生成的ground truth box也是没有角度的(只有xmin,xmax,ymin,ymax四个值)。

kwea123 commented 5 years ago
  1. 我把範圍設小的原因是因為太遠的地方點太稀疏了,照現在的encode方法去檢測的話,我覺得結果會有很多false positive
  2. 生成的ground truth是有角度的啊。就使用原本的資料來進行評價,生成的xmin,..,ymax是用來訓練bev檢測的,但評價是使用原本包含了3d資料的box。評價的話就先用bev檢測->3D box回歸->跟原本資料比對(利用原本的kitti_eval)這樣子
gujiaqivadin commented 5 years ago

Hello,kwea123! 虽然过程有点曲折,但是我已经通过网络生成了faster r-cnn的test到的txt,我将其替换掉frustum网络中的rgb_detection_val.txt。然后下一步我需要一些numpy操作来修改,从以前通过2dbox得到锥体 修改为 通过鸟瞰图2dbox得到立方体的部分。这部分代码我看了以后感觉是在perpare_data.py部分(虽然不知道其他部分有没有需要修改的地方),想知道您的prepare_data_bev.py部分是加在perpare_data.py源代码的哪一个部分。并且能否请教一下是否所有的Numpy操作都在这个代码里实现。如果有您当时修改完的整套frustum代码的话,我会非常有用,谢谢! 非常感激!

kwea123 commented 5 years ago

你已經訓練好用鳥瞰圖proposal回歸3d box的模型了嗎? 我的`prepare_data_bev.pyˋ是在生成這樣子的訓練集。

https://github.com/charlesq34/frustum-pointnets/blob/2ffdd345e1fce4775ecb508d207e0ad465bcca80/kitti/prepare_data.py#L196-L249

把這幾行替換成我的prepare_data_bev.py的內容,然後按照原本repo裡的方法訓練即可。

這個「鳥瞰圖proposal回歸3d box的模型」訓練完之後,才能進行inference。或者,其實用原本repo訓練好的模型得到的效果也不錯,不知道為什麼。

inference的方法滿複雜的,根據我上面的第2.點做。這個code我是在公司做的所以沒辦法公開...

gujiaqivadin commented 5 years ago

那个训练的话我各种Label之类的都改好了,所以应该能比较顺利的完成训练。 因为您之前说过用原本训练的模型得到的效果也不错,所以我想先生成一个自己的鸟瞰图的val_rgb_detection.pickle来观测使用之前的模型来达到准确率的情况(一方面也是为了检查自己的代码是否正确)。 今天下午我也修改了原来的这一部分的代码(如你所说的Line196-L249),但是我觉得在prepare_data.py里不止extract_frustum_data函数需要修改,extract_frustum_data_rgb_detection应该也需要修改才可以inference吧,这部分代码在您的code中好像没有找到。

gujiaqivadin commented 5 years ago

Hello,kwea123! 今天的进展就是我重新整理了一下之前得到的所有数据,将【鸟瞰图proposal回归3dbox模型】训练完成了,生成了model,之后的工作就是inference,但有些无从着手了,不知道要从哪个代码来进行修改。个人觉得如昨天所说的,需要对prepare_data.py中的extract_frustum_data_rgb_detection函数进行修改,可能未来还有很多可视化的工作需要做,我也发了一封有关自己的一些想法的邮件给您,希望得到您的回复。

kwea123 commented 5 years ago

你說的對,是我把問題想複雜了。我原本想的是如果你要一張圖片一張圖片進行inference,那這樣比較複雜。 如果只是要進行評價,就可以按照它extract_frustum_data_rgb_detection的方法來做。

其實也跟extract_frustum_data差不多呀,你只要把 https://github.com/charlesq34/frustum-pointnets/blob/2ffdd345e1fce4775ecb508d207e0ad465bcca80/kitti/prepare_data.py#L375-L388 替換成 https://github.com/kwea123/kitti_bev_detection/blob/eea96d4495c5301a92a51ca97620aa5892b25a1a/prepare_data_bev.py#L13-L19 就OK(然後下面#pass objects too small的地方把ymax-ymin<img_height_threshold這個條件刪掉)

需要注意的是這裡的detection box是鳥瞰圖上的2d box(單位是公尺),所以最後評價的時候2d部份的score會是0,只有3d會有意義。如果你想要得到偵測的3d box投影回2d圖像的結果,那你要修改https://github.com/charlesq34/frustum-pointnets/blob/2ffdd345e1fce4775ecb508d207e0ad465bcca80/train/test.py#L165-L166 要用下面的h,w,l,tx,ty,tz,ry構造出3d box然後再投影回來

gujiaqivadin commented 5 years ago

是的,您说得对。我在这两天基本上完成了extract_frustum_data_rgb_detection部分的改写,test出来的准确率的结果也如您所说的2d部分是0.00,3d和ground部分的AP大约在60%-70%左右(根据easy moderate 和hard类不同而不同)。现在我再做一些改变输入的鸟瞰图尺寸的工作(400x400x3)改成(600x600x9),想尝试看看不同情况下的AP。 此外,我最近的目标主要是想通过已训练的模型来完成一张图片的inference,比如我输入一个未知的点云,由fasterrcnn给出鸟瞰图proposal,然后由pointnet给出回归的3dbox,然后将回归得到的3dbox和groundtruthbox同时打印在一张鸟瞰图png上,您看主要是需要通过修改哪些部分来完成呢。不知道您之前说的Inference是不是这样类似的工作。

gujiaqivadin commented 5 years ago

您在extract_frustum_data_rgb_detection的修改部分,我看到您visualization.ipynb中有一个transform_to_img()来完成鸟瞰图坐标系和velo坐标系的转变的, 我模仿着自己写了个transform_to_velo()再把鸟瞰图2dbox的坐标点转换到velo坐标系下再进行切割的,最后得到了AP的结果。

但是我发现您在extract_frustum_data_rgb_detection中有这一行

box_fov_inds = box_fov_inds & img_fov_inds

但是在extract_frustum_data中却隐藏了这行,不知道是否对后期的结果有影响。

kwea123 commented 5 years ago

這一行沒特別的用,有&是只取在相片中的點,沒有就是取3d box裡面的,跟相片無關。 我個人是覺得不用&比較正確,不過差異不大。

gujiaqivadin commented 5 years ago

OK!我明天尝试比较一下异同。 此外,我想问一下如果想要通过已训练的模型来完成一张图片的inference,比如我输入一个未知的点云,由fasterrcnn给出鸟瞰图proposal,然后由pointnet给出回归的3dbox,然后将回归得到的3dbox和groundtruthbox同时打印在一张鸟瞰图png上,这个pointnet的inference工作和生成图的工作主要是需要通过修改哪些部分来完成呢。 今天我主要针对性的看了下这部分内容,感觉与test.py有比较大的关系,感觉可以通过修改test.py来完成Inference。 不知道您之前说的Inference是不是这样类似的工作,不知道能否给我讲一下您Inference的一些思路。非常感谢!

gujiaqivadin commented 5 years ago

Hello,kwea123! 近期我一直在尝试不同的模型和参数来比较AP性能,并且完成了图像的visulization。现在想用bev pointnet得到的detection_results_v1,结合原来的frustum pointnet的detection_results_v1来评估将这个bevpointnet加到原来的frustum pointnet网络上的性能提升情况。原文附录中讲到需要用nms来对bev和frustum的检测结果取iou来筛选框,但是我不太清楚从哪个代码入手来搞这个。个人感觉是test.py。但是发现它的读取方式都是按Batch来操作的,并不是按图来操作的,好像比较难搞nms。不知道您是怎么得到综合性能的。