Evaluating on custom data/images

ramanpreet9 commented 3 years ago

Hi @zaiweizhang , @GitBoSun

How can I run the model for detecting objects on my custom data/images? The classes can still be the same as scannet/sunrgbd dataset for now. From what i understand based on looking at sunrgbd data: For evaluation I need 3 files - 1) bbox.npy - this contains 3D bboxes of objects in the scene 2) pc.npz - this contains the point cloud 3) votez.npz - this contains a Nx10 array describing votes (from votenet?) that is used for detection.

lets say i capture an RGBD image. i can fill in the depth image and get a dense pointcloud (along with color). How/what can i do to run the trained model on this file. I have 2). 1) should only be used for evaluation and not inference. How do i get 3)?

zaiweizhang commented 3 years ago

First of all, for SUNRGBD benchmark, tilt angle is provided with the dataset. We can apply it to the point clouds so that the axis of all point clouds is aligned to the gravity direction. This tilt angle needs to be calculated with some algorithm and some manual adjustments, see here. I have not tried to train with depth scans not aligned to the gravity direction. You can certainly try it. I am also curious about the results.

Now, let's talk about the data. Let's say you have 1) and 2). Our current dataloader takes Nx9 array describing the object labels. It is organized like this: center(3 dimensions), size (3 dimensions), rotation (1 dimension), instance label (1 dimension), semantic label (1 dimension). In order to get these labels, you do need to have instance labels.

If you only have the object bounding box information, you can use this code to extract the points inside an object bounding box. You need to be careful of overlapping objects, such as box on a sofa.

Thanks, Zaiwei

zaiweizhang commented 3 years ago

Closing this for now. Feel free to reopen it.

ramanpreet9 commented 3 years ago

Thanks for the info. May I check if there are scripts available to convert rgbd data into a scannet or sunrgbd format you are using in the model.

for scannet - I see two files. eg. 'scene0000_00_vert.npy' - 50,000 x 6 'scene0000_00_all_noangle_40cls.npy' - 50,000 x 9

what is the information stored here? I believe _vert file include the vertices of points in the scene. does this represent [X, Y, Z, R, G, B]? what is the information stored in second file? How can I generate this format for a sample rgbd data I collect from a rgbd camera.

zaiweizhang commented 3 years ago

_vert file includes X, Y, Z, R, G, B

cls.npy file includes point-level annotation: bbox center x, bbox center y, bbox center z, bbox size x, bbox size y, bbox size z, bbox rotation angle, point instance label, bbox semantic label.

In order to generate this information, you will need to manually annotate rgbd data. Please refer to this paper for help: https://arxiv.org/abs/1702.04405

quocanh010 commented 3 years ago

Hi, Same question here. If I only have the 3d point-cloud (x,y,z,r,g,b), can I make inference on your trained model? I assumed yes, though the dataloader takes Nx9 array. I guess we can just fill the remaining columns with zeros? Thanks

zaiweizhang commented 3 years ago

Yeah. I think you can do that. Make sure to comment out the evaluation code. Might cause some problems.

zaiweizhang / H3DNet

Evaluating on custom data/images #13