Axes Orientation - Githubissues

Hello @mpitropov, after converting the dataset to KITTI and attempting to evaluate a 3D detection model, I noticed the labels weren't being picked up, I noticed that the conversion script may have had flaws with converting the 3D annotations since the axes orientation used in the 3D annotations may be different than that of KITTI. Can you please tell me which direction the axes are pointing in the 3D annotations?

Do you have a picture of what you are seeing?

The x, y, z and yaw values should be equivalent to kitti dataset https://github.com/mpitropov/cadc_devkit/blob/master/run_demo_tracklets.py#L51-L54

  T_Lidar_Cuboid[0:3,0:3] = R.from_euler('z', cuboid['yaw'], degrees=False).as_dcm();
  T_Lidar_Cuboid[0][3] = cuboid['position']['x'];
  T_Lidar_Cuboid[1][3] = cuboid['position']['y'];
  T_Lidar_Cuboid[2][3] = cuboid['position']['z'];

However, the one thing that could be causing the issue is that the Dataset labeller placed x as width and y as length. Kitti has them the other way around. https://github.com/mpitropov/cadc_devkit/blob/master/run_demo_tracklets.py#L65-L67

  width = cuboid['dimensions']['x'];
  length = cuboid['dimensions']['y'];
  height = cuboid['dimensions']['z'];

I agree with @mpitropov , Camera and Lidar axes are equivalent to KITTI

Camera: x: right, y: down, z: forward
Lidar: x: forward, y: left, z: up

The issue is when saving KITTI labels, x,y,z should be in camera coordinates no lidar or vehicle frame. So labels match with KITTI format.

KITTI format

3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)

In KITTI bbox labels reference are in camera frame , right-down-front (x,y,z), z is front. In CADC bbox labels are in vehicle/lidar frame front-left-up , so x front, and z up, axis of rotation. But labels in KITTI are in camera frame

A test e.g KITTI format label Car -1 -1 2.1963 77.0680 513.2833 313.1985 612.5529 1.7634 1.7587 4.6529 -9.5433 1.8784 14.2157 1.6051 8.1053 z = 14.2157 is front

CADC label From annotations 'label': 'Car', 'position': {'y': 9.774, 'x': 15.294, 'z': -1.571 } ; x is front, not z

In CADC we get [height,width,length, x, y, z ,yaw] as

dimentions  
length = cuboid['dimensions']['y']  , as @mpitropov  said labeler used ['y']   instead of ['length'] , same for the others dimensions 
width = cuboid['dimensions']['x']
height = cuboid['dimensions']['z']

position
x = cuboid['position']['x']
y = cuboid['position']['y']
z = cuboid['position']['z']
yaw = cuboid['yaw']

From that data labels [x,y,z] are in lidar frame x = cuboid['position']['x'] in CADC dataset labels is front direction, in KITTI labels are in camera frame, so front is z , it needs axes transformation before saving the labels. Maybe I am missing something else.

That is a good point @rvalienter90 the kitti dataset stores it in the front camera frame while for CADC we give them in the lidar frame so that the annotations will be all around our vehicle.

Looking at the CADC to KITTI script, the bounding box is converted to the camera frame: https://github.com/mathild7/faster_rcnn_pytorch_multimodal/blob/3145d91cba5ec38769b76af8f6e118748876c280/tools/cadc_unpack_all_kitti.py#L273 bbox_img = np.matmul(np.linalg.inv(cam_extrinsic),bbox_transform_matrix) https://github.com/mathild7/faster_rcnn_pytorch_multimodal/blob/3145d91cba5ec38769b76af8f6e118748876c280/tools/cadc_unpack_all_kitti.py#L282 bbox_3d_cam = [bbox_img[0][3],bbox_img[1][3],bbox_img[2][3],length,width,height,rotation_y]

Which gets output to the file: https://github.com/mathild7/faster_rcnn_pytorch_multimodal/blob/master/tools/cadc_unpack_all_kitti.py#L404-L406

            f.write('{:.3f} {:.3f} {:.3f} '.format(bbox_3d[3],bbox_3d[4],bbox_3d[5]))
            f.write('{:.3f} {:.3f} {:.3f} '.format(bbox_3d[0],bbox_3d[1],bbox_3d[2]))
            f.write('{:.5f} '.format(bbox_3d[6]))

Correct, I made some modifications and finished the cadc_kitti converter that works for me, the output is in KITTI format, labels and calib files are ok now. Using kitti_visualizer the 2D BBs and 3D BBs and BEV looks to be correct. It should work now for object detection codes that expect as input kitti-format. Hope that helps.

Calib files will look like this

P0: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0
P1: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0
P2: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0
P3: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0
R0_rect: 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
Tr_velo_to_cam: 0.01394524701841737 -0.9998000763361616 -0.01514716891344736 -0.06268651366968261 -0.005120798147914583 0.015064872783513402 -0.9998855747124629 -0.6056567052837293 0.9999020731035041 0.01403306014823503 -0.004921372004126455 -0.6618044419813216

Labels files will look like this

Truck 0.32 0 -0.96 50.0 546.0 387.0 664.0 2.136 6.267 2.288 -9.581 2.102 14.781 3.10275 436 2018_03_07 0002 
Car 0.99 0 5.44 0.0 588.0 6.0 627.0 1.888 4.989 2.003 -44.040 6.177 40.923 -3.04694 13 2018_03_07 0002 
Car 0.9 0 5.44 0.0 586.0 56.0 624.0 1.888 4.928 2.003 -41.088 6.083 41.284 -3.09088 8 2018_03_07 0002 
Car -0.33 0 4.43 473.0 519.0 534.0 545.0 1.892 4.472 1.872 -12.058 1.885 52.887 -2.63888 12 2018_03_07 0002 
Car 0.79 0 3.81 130.0 530.0 214.0 558.0 2.646 6.200 2.104 -42.614 3.131 58.209 -1.61156 14 2018_03_07 0002

And output folder

It looks like this visualizing with kitti visualization

Code attached cadc_kitti_convert.zip

Hope that helps. Thanks, :)

This is the code of the converter:

#!/usr/bin/env python

'''
want only annotations that have at least 20 lidar points
writes out annotation files
'''
import sys
#print(sys.path)

import json
import numpy as np
import cv2
from scipy.spatial.transform import Rotation as R
import os.path
from shutil import copyfile
import yaml
import math

base_dir        = '/home/perception/Data/cadc/'
drive_dir       = ['2018_03_06','2018_03_07','2019_02_27']

#2018_03_06
# Seq | Snow  | Road Cover | Lens Cover
#   1 | None  |     N      |     N
#   5 | Med   |     N      |     Y
#   6 | Heavy |     N      |     Y
#   9 | Light |     N      |     Y
#  18 | Light |     N      |     N

#2018_03_07
# Seq | Snow  | Road Cover | Lens Cover
#   1 | Heavy |     N      |     Y
#   4 | Light |     N      |     N
#   5 | Light |     N      |     Y

#2019_02_27
# Seq | Snow  | Road Cover | Lens Cover
#   5 | Light |     Y      |     N
#   6 | Heavy |     Y      |     N
#  15 | Med   |     Y      |     N
#  28 | Light |     Y      |     N
#  37 | Extr  |     Y      |     N
#  46 | Extr  |     Y      |     N
#  59 | Med   |     Y      |     N
#  73 | Light |     Y      |     N
#  75 | Med   |     Y      |     N
#  80 | Heavy |     Y      |     N

#val_seq_sel     = [[1,5,6,18],[1,4,5],[5,6,15,28,37,46,59,73,75,80]]
#Heavy snow
#val_seq_sel     = [[],[],[11,20,41,43,46,49,51,65,68,70,78]]
#Partial camera coverage
val_seq_sel     = [[5,6,8,9,10],[1,2,5,6,7],[]]
train_ratio     = 1
MIN_NUM_POINTS  = 5
crop_top        = 0
crop_bottom     = 0

T_front_cam_to_ref = np.array([
    [0.0, -1.0, 0.0],
    [0.0, 0.0, -1.0],
    [1.0, 0.0, 0.0]
])

def load_calibration(calib_path):
  calib = {}

  # Get calibrations
  calib['extrinsics'] = yaml.load(open(calib_path + '/extrinsics.yaml'), yaml.SafeLoader)
  calib['CAM00'] = yaml.load(open(calib_path + '/00.yaml'), yaml.SafeLoader)
  #calib['CAM01'] = yaml.load(open(calib_path + '/01.yaml'), yaml.SafeLoader)
  #calib['CAM02'] = yaml.load(open(calib_path + '/02.yaml'), yaml.SafeLoader)
  #calib['CAM03'] = yaml.load(open(calib_path + '/03.yaml'), yaml.SafeLoader)
  #calib['CAM04'] = yaml.load(open(calib_path + '/04.yaml'), yaml.SafeLoader)
  #calib['CAM05'] = yaml.load(open(calib_path + '/05.yaml'), yaml.SafeLoader)
  #calib['CAM06'] = yaml.load(open(calib_path + '/06.yaml'), yaml.SafeLoader)
  #calib['CAM07'] = yaml.load(open(calib_path + '/07.yaml'), yaml.SafeLoader)

  return calib

def make_calib_file(calib, calib_target_path):

    T_LIDAR_CAM = (np.array(calib['extrinsics']['T_LIDAR_CAM0' + cam]));
    # T_LIDAR_CAM = np.rot90(T_LIDAR_CAM, axes =(1,0))
    T_CAM_LIDAR = np.linalg.inv(T_LIDAR_CAM)

    T_IMG_CAM = np.eye(4);
    T_IMG_CAM[0:3, 0:3] = np.array(calib['CAM0' + cam]['camera_matrix']['data']).reshape(-1, 3);
    T_IMG_CAM = T_IMG_CAM[0:3, 0:4];  # remove last row
    # T_CAM_IMG =np.linalg.inv(T_IMG_CAM)
    dist_coeffs = np.array(calib['CAM0' + cam]['distortion_coefficients']['data'])

    R = 'R0_rect: ' + ' '.join([str(x) for x in np.eye(3).reshape(-1).tolist()])
    velodyne = T_CAM_LIDAR[0:3, 0:4];
    velodyne = 'Tr_velo_to_cam: ' + ' '.join([str(x) for x in velodyne.reshape(-1).tolist()])

    P = T_IMG_CAM
    P = ' '.join([str(x) for x in P.reshape(-1).tolist()])

    info = f"P0: {P}\nP1: {P}\nP2: {P}\nP3: {P}\n{R}\n{velodyne}\n"
    f_c = open(calib_target_path, "w+")
    f_c.write(info)

    return info

def get_camera_intrinsic_matrix(intrinsic):
    # 1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}]
    intrinsic_matrix = np.zeros((3, 4))
    intrinsic_matrix[0, 0] = intrinsic[0][0]
    intrinsic_matrix[0, 1] = 0.0
    intrinsic_matrix[0, 2] = intrinsic[0][2]
    intrinsic_matrix[1, 1] = intrinsic[1][1]
    intrinsic_matrix[1, 2] = intrinsic[1][2]
    intrinsic_matrix[2, 2] = 1.0
    return intrinsic_matrix

def cart_to_homo(mat):
    ret = np.eye(4)
    if mat.shape == (3, 3):
        ret[:3, :3] = mat
    elif mat.shape == (3, 4):
        ret[:3, :] = mat
    else:
        raise ValueError(mat.shape)
    return ret

def compute_extrinsic(calib):
    # Compute real extrinsic matrix
    # extrinsic = np.reshape(np.array(calib.extrinsic.transform), [4, 4])
    extrinsic = (np.array(calib['extrinsics']['T_LIDAR_CAM0' + cam]));
    extrinsic = np.linalg.inv(extrinsic)
    norm = np.array([[0, 0, 1], [-1, 0, 0], [0, -1, 0]])
    extrinsic[:3, 3] = extrinsic[:3, 3].reshape(1, 3).dot(norm)
    _norm = np.eye(4)
    _norm[:3, :3] = norm.T
    extrinsic = extrinsic.dot(_norm)
    return extrinsic

#Create target Directory if don't exist
def create_output_directory(camera):
    home_dir = os.path.join(base_dir,'object')
    modes = ['training','val']
    for mode in modes:
        #current_dir ="%s/annotation" %pwd
        dir_names = []
        # dir_names.append('image_0{:1d}'.format(int(camera)))
        # dir_names.append('point_clouds')
        # dir_names.append('calib')
        # dir_names.append("annotation_0{:01d}".format((int(camera))))
        dir_names.append('image_2'.format(int(camera)))
        dir_names.append('velodyne')
        dir_names.append('calib')
        dir_names.append("label_2".format((int(camera))))
        for dir_name in dir_names:
            target_path = os.path.join(home_dir,mode,dir_name)
            if not os.path.exists(target_path):
                os.makedirs(target_path)
                print("Directory ", target_path,  " Created ")
            else:
                print("Directory ", target_path,  " already exists")
    return home_dir

def write_txt_annotation(frame,cam,out_dir,mode,image_dir,drive,seq,drive_and_seq):
    cam = str(cam)
    DISTORTED = False

    writeout = True
    filter = MIN_NUM_POINTS  #only show/write out annotations that have at least 10 lidar points in them

    img_path = image_dir + '/' + format(frame, '010') + ".png"
    print(img_path)
    if DISTORTED:
      path_type = 'raw'
    else:
      path_type = 'processedmoose'
    annotations_file = os.path.join(base_dir,drive,seq,'3d_ann.json') #contains all the annotations
    lidar_path = os.path.join(base_dir, drive,seq, "labeled", "lidar_points", "data", format(frame, '010') + ".bin")  #change this to snowy or not snowy
    calib_path = os.path.join(base_dir, drive, 'calib')
    calib = load_calibration(calib_path)

    # Load 3d annotations
    annotations_data = None
    with open(annotations_file) as f:
        annotations_data = json.load(f)

    cam_intrinsic = np.eye(4)  #identity matrix
    cam_intrinsic[0:3,0:3] = np.array(calib['CAM0' + cam]['camera_matrix']['data']).reshape(-1, 3)  #K_xx camera projection matrix (3x3)
    cam_extrinsic = np.array(calib['extrinsics']['T_LIDAR_CAM0' + cam])  #
    cam_distorted = calib['CAM0' + cam]['distortion_coefficients']['data']  #D_xx camera distortion matrix
    k1 = cam_distorted[0]
    k2 = cam_distorted[1]
    p1 = cam_distorted[2]
    p2 = cam_distorted[3]
    k3 = cam_distorted[4]

    fc1 = cam_intrinsic[0][0]
    fc1_alpha_c = cam_intrinsic[0][1]
    cc1 = cam_intrinsic[0][2]
    fc2 = cam_intrinsic[1][1]
    cc2 = cam_intrinsic[1][2]

    # Projection matrix from camera to image frame
    #T_IMG_CAM = np.eye(4); #identity matrix
    #T_IMG_CAM[0:3,0:3] = np.array(calib['CAM0' + cam]['camera_matrix']['data']).reshape(-1, 3); #camera to image #this comes from e.g "F.yaml" file

    #T_IMG_CAM : 4 x 4 matrix
    #T_IMG_CAM = T_IMG_CAM[0:3,0:4]; # remove last row, #choose the first 3 rows and get rid of the last row

    #T_CAM_LIDAR = np.linalg.inv(np.array(calib['extrinsics']['T_LIDAR_CAM0' + cam])); #going from lidar to camera

    #T_IMG_LIDAR = np.matmul(T_IMG_CAM, T_CAM_LIDAR); # go from lidar to image

    img = cv2.imread(img_path)
    img_h, img_w = img.shape[:2]  #get the image height and width
    img = img[crop_top:,:,:]
    # img = img[:-crop_bottom,:,:]

    #Add each cuboid to image
    #the coordinates in the tracklet json are lidar coords

    #drive_and_seq = int(seq)*1000 + drive_dir.index(drive)*100000
    #if cam =='0':
    #    cam_id   = 0
    #elif cam =='1':
    #    cam_id   = 1*10000000
    #elif cam == '2':
    #    cam_id = 2*10000000
    #elif cam == '3':
    #    cam_id = 3*10000000
    #elif cam =='4':
    #    cam_id = 4*10000000
    #elif cam =='5':
    #    cam_id = 5*10000000
    #elif cam =='6':
    #    cam_id = 6*10000000
    #else:
    #    cam_id = 7*10000000
    frame_id     = frame + drive_and_seq
    # mode="training"
    img_file     = os.path.join(out_dir,mode,'image_2'.format(int(cam)),'{:06d}.png'.format(frame_id))
    target_lidar = os.path.join(out_dir,mode,'velodyne','{:06d}.bin'.format(frame_id))
    # annotation_target_path = os.path.join(out_dir,mode,'annotation_0{:01d}'.format(int(cam)),'{:07d}.txt'.format(frame_id))

    annotation_target_path = os.path.join(out_dir,mode,'label_2'.format(int(cam)),'{:06d}.txt'.format(frame_id))
    calib_target_path      = os.path.join(out_dir,mode,'calib','{:06d}.txt'.format(frame_id))
    f = open(annotation_target_path, "w+")
    if(not os.path.isfile(calib_target_path)):
        make_calib_file(calib, calib_target_path)
    for cuboid in annotations_data[frame]['cuboids']:

        #T_Lidar_Cuboid = np.eye(4); #identity matrix
        #T_Lidar_Cuboid[0:3,0:3] = R.from_euler('z', cuboid['yaw'], degrees=False).as_dcm(); #make a directional cosine matrix using the yaw, i.e rotation about z, yaw angle

        '''
        Rotations in 3 dimensions can be represented by a sequece of 3 rotations around a sequence of axes. 
        In theory, any three axes spanning the 3D Euclidean space are enough. In practice the axes of rotation are chosen to be the basis vectors.

        The three rotations can either be in a global frame of reference (extrinsic) or in a body centred frame of refernce (intrinsic),
        which is attached to, and moves with, the object under rotation

        In our case, 'z' specifies the axis of rotation (extrinsic rotation), cudoid['yaw] euler angles in radians (rotation angle)
        Returns object containing the rotation represented by the sequencce of rotations around given axes with given angles

        T_Lidar_Cuboid = basic rotation (elemental rotation) : R_z(theta = yaw) = [[ cos(theta) - sin(theta) 0 etc]]
        * .as_dcm - Represent as direction cosine matrices.

        3D rotations can be represented using direction cosine matrices, which are 3 x 3 real orthogonal matrices with determinant equal to +1

        '''
        num_lidar_points     = cuboid['points_count']

        #the above is the translation part of the transformation matrix, so now we have the rotation and the translation

        length = cuboid['dimensions']['y'] #x is just a naming convention that scale uses, could have easily been cuboid['dimensions']['width']
        width = cuboid['dimensions']['x']
        height = cuboid['dimensions']['z']
        x = cuboid['position']['x']
        y = cuboid['position']['y']
        z = cuboid['position']['z']
        yaw = cuboid['yaw']

        bbox_3d = [x, y, z,height,width,length,yaw]

        #radius = 3
        bbox_transform_matrix = get_box_transformation_matrix(bbox_3d)
        bbox_img = np.matmul(np.linalg.inv(cam_extrinsic),bbox_transform_matrix)
        bbox_img_yaw = -cuboid['yaw'] + np.pi/2.0
        # bbox_img_yaw = -cuboid['yaw']

        if bbox_img_yaw < -np.pi:
           rotation_y = bbox_img_yaw + 2*np.pi
        elif bbox_img_yaw  > np.pi:
           rotation_y = bbox_img_yaw - 2 * np.pi

        #else:
        rotation_y = bbox_img_yaw
        bbox_3d_cam = [bbox_img[0][3],bbox_img[1][3]+ height / 2,bbox_img[2][3],height,length,width,rotation_y]
        #note that in the lidar frame, up is z, forwards is x, side is y
        if(cuboid['position']['x'] - length/2.0 <= 0):
            continue
        '''
        Very import to remember that The LiDAR frame has x forward, y left and z up. The given yaw is around the z axis
        the cuboid frame which we create has the same axis
        '''
        scan_data = np.fromfile(lidar_path, dtype=np.float32)  # numpy from file reads binary file
        lidar = scan_data.reshape((-1, 4))

        [rows,cols] = lidar.shape  # rows and cols of the lidar points, rows = number of lidar points, columns (4 = x , y, z , intensity)

        # 0: front , 1: front right, 2: right front, 3: back right, 4: back, 5: left back, 6: left front, 7: front left

        if num_lidar_points > filter: #if we filter such that the annotations should have at least 20 lidar
            lidar_to_image = get_image_transform(cam_intrinsic, cam_extrinsic)  # magic array 4,4 to multiply and get image domain
            #bbox_transform_matrix = transform_axes(bbox_transform_matrix)
            box_to_image = np.matmul(lidar_to_image, bbox_transform_matrix)
            vertices = np.empty([2,2,2,2])
            ignore = False
            # 1: 000, 2: 001, 3: 010:, 4: 100
            for k in [0, 1]:
                for l in [0, 1]:
                    for m in [0, 1]:
                        # 3D point in the box space
                        v = np.array([(k-0.5), (l-0.5), (m-0.5), 1.])

                        # Project the point onto the image
                        v = np.matmul(box_to_image, v)

                        # If any of the corner is behind the camera, ignore this object.
                        if v[2] < 0:
                            ignore = True

                        x = v[0]/v[2]
                        y = v[1]/v[2]
                        #r = np.sqrt(x**2 + y**2)
                        #radial_correction = (1+k1 * r**2 + k2 * r**4 + k3 * r**6)
                        #x_tang_correction = 2*p1 * x * y + p2 * (r**2 + 2 * x**2)
                        #y_tang_correction = p1 * (r**2 + 2 * y**2) * x * y + 2 * p2 * (x * y)
                        #x_corr = x*radial_correction + x_tang_correction
                        #y_corr = y*radial_correction + y_tang_correction
                        #x_p    = fc1*x_corr + fc1_alpha_c*y_corr + cc1
                        #y_p    = fc2*y_corr + cc2
                        vertices[k,l,m,:] = [x, y]

            vertices = vertices.astype(np.int32)
            vert_2d  = compute_2d_bounding_box(img,vertices)
            x1,y1,x2,y2 = vert_2d
            if(ignore):
                continue

            #print("This is the y rotation")
            #print(rotation_y/np.pi * 180)

            #print("This is the given yaw in lidar frame")
            #print (cuboid['yaw']/np.pi*180)
            viewing_angle = np.arctan2(bbox_3d_cam[0], bbox_3d_cam[2]) #arctan2(x/z)

            #print("This is the viewing angle")
            #print(viewing_angle/np.pi * 180)

            alpha_tmp = rotation_y - viewing_angle

            #if alpha_tmp < -np.pi:
            #    alpha = alpha_tmp + 2*np.pi

            #elif alpha_tmp > np.pi:
            #    alpha = alpha_tmp - 2*np.pi

            #else:
            alpha = alpha_tmp

            #print("This is alpha")
            #print(alpha/np.pi*180)

            #print("***************************")

            #truncation calculation

            x_min_set = min(img_w-1,max(0,x1))
            y_min_set = min(img_h-1,max(0,y1)) - crop_top
            x_max_set = min(img_w-1,max(0,x2))
            # y_max_set = min(img_h-1,max(0,y2)) - crop_top
            y_max_set = min(img_h - 1, max(0, y2))

            area_actual = (y1 - x1) * (y2 - y1)
            area_set = (x_max_set - x_min_set) * (y_max_set - y_min_set)

            if (x_min_set < 0 and x_max_set > img_w):  #get rid of any weird huge bb, where x min and x max span over the whole image
                continue

            if (y_min_set < 0 and y_max_set > img_h):  #get rid of any huge bb where y min and y max span over the whole image
                continue

            if area_set == 0:  #tracklet is outside of the image
                continue
            if(area_actual <= 0):
                ratio_of_area = 0
            else:
                ratio_of_area = area_set/area_actual

            if ratio_of_area == 1:
                truncation = 0

            else:
                truncation = 1 - ratio_of_area

            '''
            example of a cuboid statement: {'uuid': '33babea4-958b-49a1-ac65-86174faa111a', 'attributes': {'state': 'Moving'}, 'dimensions': {'y': 4.276, 'x': 1.766, 'z': 1.503}, 'label': 'Car', 'position': {'y': 5.739311373648604, 'x': 57.374972338211876, 'z': -1.5275162154592332}, 'camera_used': None, 'yaw': -3.1134003618947323, 'stationary': False}
                '''
            #cv2.rectangle(img, (x_min_set, y_min_set), (x_max_set,y_max_set), (0,255,0),2)
            f.write(cuboid['label'])
            f.write(' %s '%(round(truncation,2)))  #trucation
            f.write('0 ')  #occlusion
            f.write('%s ' %(round(alpha,2)))
            f.write('{:.1f} {:.1f} {:.1f} {:.1f} '.format(x_min_set, y_min_set, x_max_set, y_max_set)) #pixel
            f.write('{:.3f} {:.3f} {:.3f} '.format(bbox_3d_cam[3],bbox_3d_cam[4],bbox_3d_cam[5]))
            f.write('{:.3f} {:.3f} {:.3f} '.format(bbox_3d_cam[0],bbox_3d_cam[1],bbox_3d_cam[2]))
            f.write('{:.5f} '.format(bbox_3d[6]))
            f.write('{:d} '.format(num_lidar_points))
            f.write('{:s} '.format(drive))
            f.write('{:s} '.format(seq))
            f.write('\n')
    f.close()
    copyfile(lidar_path,target_lidar)
    cv2.imwrite(img_file, img)

def get_image_transform(intrinsic, extrinsic):
    """ For a given camera calibration, compute the transformation matrix
        from the vehicle reference frame to the image space.
    """
    # Camera model:
    # | fx  0 cx 0 |
    # |  0 fy cy 0 |
    # |  0  0  1 0 |

    # Swap the axes around
    # Compute the projection matrix from the vehicle space to image space.
    lidar_to_img = np.matmul(intrinsic, np.linalg.inv(extrinsic))
    return lidar_to_img
    #return np.linalg.inv(extrinsic)

def project_pts(calib_file,points):
    fd = open(calib_file,'r').read().splitlines()
    cam_intrinsic = np.eye(4)  #identity matrix
    for line in fd:
        matrix = line.rstrip().split(' ')
        if(matrix[0] == 'T_LIDAR_CAM00:'):
            cam_extrinsic = np.array(matrix[1:]).astype(np.float32)[np.newaxis,:].reshape(4,4)
        if(matrix[0] == 'CAM00_matrix:'):
            cam_intrinsic[0:3,0:3] = np.array(matrix[1:]).astype(np.float32).reshape(3, 3)  #K_xx camera projection matrix (3x3)
    transform_matrix = get_image_transform(cam_intrinsic, cam_extrinsic)  # magic array 4,4 to multiply and get image domain
    points_exp = np.ones((points.shape[0],4))
    points_exp[:,0:3] = points
    points_exp = points_exp[:,:]
    #batch_transform_matrix = np.repeat(transform_matrix[np.newaxis,:,:],points_exp.shape[0],axis=0)
    projected_points = np.zeros((points_exp.shape[0],3))
    for i, point in enumerate(points_exp):
        projected_points[i] = np.matmul(transform_matrix,point)[0:3]
    #projected_pts = np.einsum("bij, bjk -> bik", batch_transform_matrix, points_exp)[:,:,0]
    return projected_points

def compute_2d_bounding_box(img,points):
    """Compute the 2D bounding box for a set of 2D points.

    img_or_shape: Either an image or the shape of an image.
                  img_or_shape is used to clamp the bounding box coordinates.

    points: The set of 2D points to use
    """
    shape = img.shape

    # Compute the 2D bounding box and draw a rectangle
    x1 = np.amin(points[...,0])
    x2 = np.amax(points[...,0])
    y1 = np.amin(points[...,1])
    y2 = np.amax(points[...,1])

    x1 = min(max(0,x1),shape[1])
    x2 = min(max(0,x2),shape[1])
    y1 = min(max(0,y1),shape[0])
    y2 = min(max(0,y2),shape[0])

    return (x1,y1,x2,y2)

def get_box_transformation_matrix(box):
    """Create a transformation matrix for a given label box pose."""

    tx,ty,tz = box[0],box[1],box[2]
    c = math.cos(box[6])
    s = math.sin(box[6])

    sl, sh, sw = box[3], box[4], box[5]

    return np.array([
        [ sl*c,-sw*s,  0,tx],
        [ sl*s, sw*c,  0,ty],
        [    0,    0, sh,tz],
        [    0,    0,  0, 1]])

def get_image_transform(intrinsic, extrinsic):
    """ For a given camera calibration, compute the transformation matrix
        from the vehicle reference frame to the image space.
    """
    # Camera model:
    # | fx  0 cx 0 |
    # |  0 fy cy 0 |
    # |  0  0  1 0 |

    # Swap the axes around
    # Compute the projection matrix from the vehicle space to image space.
    lidar_to_img = np.matmul(intrinsic, np.linalg.inv(extrinsic))
    return lidar_to_img
    #return np.linalg.inv(extrinsic)

def transform_axes(bbox_transform_matrix):
    axes_transformation = np.array([
        [0,-1,0,0],
        [0,0,-1,0],
        [1,0,0,0],
        [0,0,0,1]])
    return np.matmul(axes_transformation,bbox_transform_matrix)

def write_all_annotations(cam):
    cam = str(cam)
    out_dir = create_output_directory(cam)
    for i,ddir in enumerate(drive_dir):
        data_dir = os.path.join(base_dir,ddir)
        sub_folders = sorted(os.listdir(data_dir))
        calib_idx = sub_folders.index('calib')
        del sub_folders[calib_idx]
        seq_count = len(sub_folders)

        for sequence in sub_folders:
            drive_and_sequence = int(sequence)*100 + drive_dir.index(ddir)*10000
            if(sequence == 'calib'):
                print('something very wrong has happened')
            #if(i > train_ratio*seq_count):
            #    mode = 'val'
            #else:
            #    mode = 'train'
            # if(int(sequence) in val_seq_sel[i]):
            #     mode = 'val'
            #     print('val')
            # else:
            #     print('train')
            #     mode = 'train'
            mode = "training"
            image_dir = os.path.join(data_dir,sequence, 'labeled','image_0' + cam, "data")
            seq_dir   = os.path.join(data_dir,sequence)
            file_names = sorted(os.listdir(image_dir))
            for frame in range(len(file_names)):
                write_txt_annotation(frame,cam,out_dir,mode,image_dir,ddir,sequence,drive_and_sequence)

if __name__ == '__main__':
    args = sys.argv
    cam = '0'
    if(len(args) > 1):
        if(args[1] == 'unpack'):
            base_dir = args[2]
            write_all_annotations(cam)
        elif('help' in args[1]):
            print('example usage:')
            print('    python3 filename.py unpack /path/to/download_data_dir ')
        else:
            print('unknown command, please type in --help as the argument')
    else:
        print('No input args specified, using default values. DANGEROUS!!')
        write_all_annotations(cam)

Thanks!

Hello @rvalienter90, I've done similar modifications to the cadc_unpack_all_kitti.py. However, I included the Tr_velo_imu as it is expected as a parameter by some 3D detection models. I've also removed the cropping of the image as I feared it may affect projection of the point cloud onto the image as cropping may sometimes result in the generation of a new intrinsic matrix different from the original. There's one more thing that must be done in order to test detection models. A filter must be created to filter out 3D annotations outside Camera 00's FOV. I'm currently using some of @mpitropov 's code from run_demo_lidar_bev and run_demo_tracklets to do this. Until this filter is created, the model, MVXNet (which uses two modalities to detect the given classes) keeps registering true negatives as it compares its detections which are within camera 00's FOV with the ground truth which includes labels outside it. Once this done, the conversion will be perfected.

@rvalienter90, I noticed something in your calib files which is peculiar, in Tr_velo_to_cam is different from the one I've produced. This an example of a calib file I've generated. If you're using camera 00 and the lidar, I think you should revise this, I've used tested the matrix for lidar-camera projection and the projection was very precise so I can confirm it's correct. This is what I did in the function make_calib_file:


P1: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0 
P2: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0 
P3: 653.956033188809 -0.235925653043616 653.221172545916 0.0 0.0 655.54008861796 508.732863993917 0.0 0.0 0.0 1.0 0.0 
R0_rect: 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 
Tr_velo_to_cam: 0.01395679999999994 -0.0051327600000000044 0.9998769999999992 0.6594892526583304 -0.9997759999999967 0.015076499999999996 0.014020699999999933 -0.044262327033722786 -0.015134699999999947 -0.9998609999999999 -0.004909509999999997 -0.6097703961060831 
Tr_imu_to_velo: 0.0146392923884 0.999713733025 0.0189246695892 -1.12495691821 -0.999886807886 0.0145708401517 0.00374993783733 0.0726605553018 0.00347311601847 -0.0189774239023 0.999813880103 -1.25```

-------------------------------------------------------------------------------------------------------------------

```def make_calib_file(calib, calib_target_path, image_dir, frame):
    f_c = open(calib_target_path, "w+")
    cam_intrinsic = np.eye(3)  #identity matrix
    cam_intrinsic[0:3,0:3] = np.array(calib['CAM0' + cam]['camera_matrix']['data']).reshape(3, 3)  #K_xx camera projection matrix (3x3)
    #print("OLD CAMERA INTRINSIC")
    #print(cam_intrinsic)
    img_path = image_dir + '/' + format(frame, '010') + ".png"
    #print(img_path)
    img = cv2.imread(img_path)
    cam_distorted = np.array(calib['CAM0' + cam]['distortion_coefficients']['data'])  #D_xx camera distortion matrix
    h,  w = img.shape[:2]
    #newCameraMatrix, roi = cv2.getOptimalNewCameraMatrix(cam_intrinsic, cam_distorted, (w,h), 0, (w,h))
    #dst = cv2.undistort(img, cam_intrinsic, cam_distorted, None, newCameraMatrix)
    #print("NEW CAMERA MATRIX")
    #print(newCameraMatrix)
    rectify = np.eye(3)
    projection_matrix = np.eye(4)
    projection_matrix[0:3,0:3] = cam_intrinsic
    projection_matrix = projection_matrix[0:3,0:4]
    #print("PROJECTION MATRIX")
    #print(projection_matrix)
    cam_extrinsic = np.array(calib['extrinsics']['T_LIDAR_CAM00'])  #
    cam_extrinsic = cam_extrinsic[0:3][0:4]
    imu_velo = np.array(calib['extrinsics']['T_LIDAR_GPSIMU'])
    imu_velo = imu_velo[0:3][0:4]
    projection = ""
    rect = ""
    extrinsic = ""
    imu = ""
    for i in projection_matrix:
        for j in i:
            projection = projection + str(j) + " "
    for h in rectify:
        for j in h:
            rect = rect + str(j) + " "
    for k in cam_extrinsic:
        for j in k:
            extrinsic = extrinsic + str(j) + " "
    for l in imu_velo:
        for j in l:
            imu = imu + str(j) + " "
    f_c.write("P0: "+ projection)
    f_c.write('\n')
    f_c.write("P1: "+ projection)
    f_c.write('\n')
    f_c.write("P2: "+ projection)
    f_c.write('\n')
    f_c.write("P3: "+ projection)
    f_c.write('\n')
    f_c.write("R0_rect: " + rect)
    f_c.write('\n')
    f_c.write("Tr_velo_to_cam: " + extrinsic)
    f_c.write('\n')
    f_c.write("Tr_imu_to_velo: " + imu)```

Hello, @rvalienter90. I'm curious as to why you added height /2 to bbox_img[1][3] in the line: bbox_3d_cam = [bbox_img[0][3],bbox_img[1][3]+ height / 2,bbox_img[2][3],height,length,width,rotation_y] and why you changed the order to height, length, width and not height, width, length?

Thanks, @Balazizo !, right I will revise the Tr_velo_to_cam. Yes, I agree Tr_velo_imu sometimes is needed. I agree, I removed the cropping of the image too Btw in cadc_unpack_all_kitti.py there is a filtering of the annotations, in the output label file just the annotations that pass the filter are saved, but yes that filtering can be improved.

One thing that could be nice is if after the right filtering we fix the annotations.json (overwritten) by setting cuboid['camera_used']== correct camera, as now cuboid ['camera_used'] is not always correct in the annotations. In that case, when checking if objects are in the images we just need to check if cuboid ['camera_used'] == int(cam). I added height / 2 because I noticed that in the output the BBs were shifted in the z-direction. Also length, width was inverted in the output. Maybe because the rotation is wrong or my Tr_velo_to_cam is wrong. I will check that with the new Tr_velo_to_cam.

Thanks!

Alright, please send here after you see the results.

Hello @rvalienter90, I'm almost done converting the calib files to ensure perfect projection. By tomorrow (Thursday), I'll send a combination of your code and my code and we will have the perfect conversion script, hopefully.

Hi @Balazizo great!!, The script will be helpful to a lot of people.

Alright so after a full day of work, I was able to make it so the projection of the point cloud onto the image within kitti visualizer was almost perfect. This was done by taking the extrinsic matrix back to formula, splitting it into rotation and translation matrices then obtaining euler angles from the rotation matrix, followed by some transformations to make the axes similar. However, there is a slight skew in terms of yaw. I've tried to resolve this by altering the different angles post axis-transformation. However, it seems that when I modify the angle representing "yaw" (in the line rot_matrix = .....), the effects are reflected in the roll and the same with the angle representing roll; I noticed I was able to alter roll using 2 angles and pitch with one which was very peculiar. The results in kitti visualiser can be seen in the screenshot attached, a rotation in yaw is necessary. If anyone can assist with the completion of this, I think it would be a lot faster.

Screenshot from 2021-07-02 01-43-40

The conversion of the calib files to Kitti format can be found below:


def rotationMatrixToEulerAngles(R) :

    #assert(isRotationMatrix(R))

    sy = math.sqrt(R[0,0] * R[0,0] +  R[1,0] * R[1,0])

    singular = sy < 1e-6

    if  not singular :
        x = math.atan2(R[2,1] , R[2,2])
        y = math.atan2(-R[2,0], sy)
        z = math.atan2(R[1,0], R[0,0])
    else :
        x = math.atan2(-R[1,2], R[1,1])
        y = math.atan2(-R[2,0], sy)
        z = 0

    return np.array([x, y, z])

def make_calib_file(calib, calib_target_path, image_dir, frame):
    f_c = open(calib_target_path, "w+")
    cam_intrinsic = np.eye(3)  #identity matrix
    cam_intrinsic[0:3,0:3] = np.array(calib['CAM00']['camera_matrix']['data']).reshape(3, 3)  #K_xx camera projection matrix (3x3)
    #print("OLD CAMERA INTRINSIC")
    #print(cam_intrinsic)
    img_path = image_dir + '/' + format(frame, '010') + ".png"
    #print(img_path)
    img = cv2.imread(img_path)
    cam_distorted = np.array(calib['CAM00']['distortion_coefficients']['data'])  #D_xx camera distortion matrix
    h,  w = img.shape[:2]
    #newCameraMatrix, roi = cv2.getOptimalNewCameraMatrix(cam_intrinsic, cam_distorted, (w,h), 0, (w,h))
    #dst = cv2.undistort(img, cam_intrinsic, cam_distorted, None, newCameraMatrix)
    #print("NEW CAMERA MATRIX")
    #print(newCameraMatrix)
    rectify = np.eye(3)
    projection_matrix = np.eye(4)
    projection_matrix[0:3,0:3] = cam_intrinsic
    projection_matrix = projection_matrix[0:3,0:4]
    #print("PROJECTION MATRIX")
    #print(projection_matrix)
    cam_extrinsic = np.array(calib['extrinsics']['T_LIDAR_CAM00'])  #
    cam_extrinsic = cam_extrinsic[0:3][0:4]

#------------------------------------- Must reverse-engineer lidar-camera extrinsics-----------------------------------------------------------
    # Obtain Rotation Matrix
    rotation_matrix = np.eye(3)
    rotation_matrix = np.array(cam_extrinsic[0:3,0:3])
    xyz = rotationMatrixToEulerAngles(rotation_matrix)

    # Preparing to modify euler angles
    x = xyz[0]
    y = xyz[1]
    z = xyz[2]

    #Transformation around x-axis by pi/2 (angle is in radians and is positive due to anti-clockwise rotation)
    angle_cos_x = np.cos([(np.pi)/2])
    angle_cos_x = angle_cos_x[0]
    angle_sin_x = np.sin([(np.pi)/2])
    angle_sin_x = angle_sin_x[0]
    rot_x = [[1, 0, 0],[0, angle_cos_x, angle_sin_x],[0, -angle_sin_x, angle_cos_x]]

    #Transformation around y-axis by  (- pi/2) (angle is negative due to clockwise rotation)
    angle_cos_y = np.cos([-((np.pi)/2)])
    angle_cos_y = angle_cos_y[0]
    angle_sin_y = np.sin([-((np.pi)/2)])
    angle_sin_y = angle_sin_y[0]
    rot_y = [[angle_cos_y, 0, -angle_sin_y],[0, 1, 0],[angle_sin_y, 0, angle_cos_y]]

   #Generating matrix to transform euler angles to new axes
    transformation_matrix = np.matmul(rot_y, rot_x)

   #Applying transformation matrix to angles
    newPoint = np.matmul(transformation_matrix, xyz)
    rot_matrix = np.array(R.from_euler('xyz', [newPoint[0], newPoint[1]-0.05, newPoint[2]]).as_dcm())
    # trans_matrix = tx, ty tz
    trans_matrix = np.array(cam_extrinsic[0:3,3])

    # Swapping axes on translation matrix
    tmpX = trans_matrix[0]
    tmpY = trans_matrix[1]  
    tmpZ = trans_matrix[2]
    trans_matrix[0] = tmpY  
    trans_matrix[1] = tmpZ+0.66  #0.66 added as visual estimate
    trans_matrix[2] = tmpX

    trans_Col = np.array([[trans_matrix[0]], [trans_matrix[1]], [trans_matrix[2]]])

   #New Extrinsic Matrix
   cam_extrinsic = np.hstack((rot_matrix, trans_Col))

    imu_velo = np.array(calib['extrinsics']['T_LIDAR_GPSIMU'])
    imu_velo = imu_velo[0:3][0:4]
    projection = ""
    rect = ""
    extrinsic = ""
    imu = ""
    for i in projection_matrix:
        for j in i:
            projection = projection + str(j) + " "
    for h in rectify:
        for j in h:
            rect = rect + str(j) + " "
    for k in cam_extrinsic:
        for j in k:
            extrinsic = extrinsic + str(j) + " "
    for l in imu_velo:
        for j in l:
            imu = imu + str(j) + " "
    f_c.write("P0: "+ projection)
    f_c.write('\n')
    f_c.write("P1: "+ projection)
    f_c.write('\n')
    f_c.write("P2: "+ projection)
    f_c.write('\n')
    f_c.write("P3: "+ projection)
    f_c.write('\n')
    f_c.write("R0_rect: " + rect)
    f_c.write('\n')
    f_c.write("Tr_velo_to_cam: " + extrinsic)
    f_c.write('\n')
    f_c.write("Tr_imu_to_velo: " + imu)

    f_c.close()```

Hello @rvalienter90 @mpitropov , the conversion script has been perfected both label-wise (thanks to @rvalienter90) and calibration-wise. Visualizing the projection of the lidar data onto the image in the kitti_visualizer and the 3D bounding boxes, the results are as follows : Screenshot from 2021-07-04 15-45-21

Results look great @Balazizo, lidar data and 3D BBs look perfect, I will test it tomorrow!! Thank you very much!

Hello all, this marks the final change made to the conversion script (for now). Originally, the 2D BBoxes produced from the 3D ground truth didn't fit the objects properly. These problems have been remedied as the before, after images. The code could use some cleaning but that's a problem for another day. In the future, when I'm done with my bachelor, I hope to denoise the pointclouds within the bin files before writing them back. However, with the current script, a 2D and 3D detection leaderboard is now possible for the CADCD, @mpitropov .

2D_Boxes_AfterBefore

Hello all, this marks the final change made to the conversion script (for now). Originally, the 2D BBoxes produced from the 3D ground truth didn't fit the objects properly. These problems have been remedied as the before, after images. The code could use some cleaning but that's a problem for another day. In the future, when I'm done with my bachelor, I hope to denoise the pointclouds within the bin files before writing them back. However, with the current script, a 2D and 3D detection leaderboard is now possible for the CADCD, @mpitropov .

Thank you for your work! Can you share the full code about how to convert cadc to kitti format? Your code is little different with rvalienter90's, for exampler, the function make_calib_file.

Hello all, this marks the final change made to the conversion script (for now). Originally, the 2D BBoxes produced from the 3D ground truth didn't fit the objects properly. These problems have been remedied as the before, after images. The code could use some cleaning but that's a problem for another day. In the future, when I'm done with my bachelor, I hope to denoise the pointclouds within the bin files before writing them back. However, with the current script, a 2D and 3D detection leaderboard is now possible for the CADCD, @mpitropov .

Thank you for your work! Can you share the full code about how to convert cadc to kitti format? Your code is little different with rvalienter90's, for exampler, the function make_calib_file.

Please someone share the script????????

I downloaded the script from a previous version of this comment https://github.com/mpitropov/cadc_devkit/issues/21#issuecomment-876656105 cadc_unpack_kitti.zip

mpitropov / cadc_devkit

Axes Orientation #21