tdavchev / DESIRE

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
78 stars 26 forks source link

Is this project is still in work in progress? #4

Open jagdishbhanushali opened 4 years ago

jagdishbhanushali commented 4 years ago

Hi, Are you still working on this or it is finished? I was get inspired by your paper and would like to see results.

Thanks, Jagdish

stratomaster31 commented 4 years ago

I'm working on this model. I've coded the CVAE and I have good results in training phase, but not for test phase... Which are the decoder1 inputs? In the paper is not specified...

i-chaochen commented 4 years ago

I think this work only can handle Stanford Drone Dataset? Do you know how to process KITTI dataset?

In the original paper, as the following

As the dataset does not provide semantic labels for 3D points (which we need for scene context), we first perform semantic segmentations of images and project Velodyne laser scans onto the image plane using the provided camera matrix to label 3D points. The semantically labeled 3D points are then registered into the world coordinates using GPS-IMU tags. Finally we create top-down view feature maps I of size H ×W × C.

If I understood correctly, they did these:

  1. first do the semantic segmentation of all images, get masks of data.

  2. project laser data into 2d image and put mask data on this 2d image. // i.e., opencv's projectPoints() to do the project?

    2.1 Since KITTI is bin format, we need to convert it to PCD first.

    2.2 Do the registration for all PCD files to fuse as a global frame, and then finally we can use camera matrix (provided by KITTI) and extrinsic matrix (calculated by GPU-IMU) to covert it as a 2d image, and we also will project segmentation mask from step-1 to this projected 2d image.

Anyone can correct me if I'm wrong? Thanks in advance!

sujithvemi commented 4 years ago

@i-chaochen

I don't work with LiDAR data, so I can't comment on bin and PCD format etc. But the approach that you are taking sounds fine to me. To summarize, this is my understanding:

Feel free to comment if I am wrong in any sense, so we can better understand. Thanks in advance.

i-chaochen commented 4 years ago

@i-chaochen

I don't work with LiDAR data, so I can't comment on bin and PCD format etc. But the approach that you are taking sounds fine to me. To summarize, this is my understanding:

  • Project the Velodyne 3D laser scan to 2D image plane
  • All the points in the third dimension that fall on same point in the 2D image plane get the same label as that is recognised from semantic segmentation
  • Now the points are converted to world co-ordinate frame
  • Build a BEV 3D matrix with the third dimension being a one-hot vector corresponding to the class from semantic segmentation (cropping of this feature map can be done before building it)

Feel free to comment if I am wrong in any sense, so we can better understand. Thanks in advance.

@sujithvemi Thanks for the feedback. I am not sure I fully understood what the original paper means for project Velodyne laser scans onto the image plane. What this image plane looks like? Does it look like this one?

Screenshot 2019-11-08 at 00 48 51

Also, since they already project 3D scans to 2D image plane, why they need to register 3D scans to the world coordinate using GPS-IMU tag? 2D image coordinate can be used for the prediction anyway.

If they want to do the register to the world coordinate, I think they will need intrinsic and extrinsic (it can provide by GPS-IMU?) matrices instead of GPS-IMU.

sujithvemi commented 4 years ago

@i-chaochen I really wish I could help you here. But I really don't know much about LIDAR and was not able to fully understand what the paper said.

You can check the supplemental material provided here, it might help you.