Training Set size not 17065 for NuScenes after preprocessing

demmerichs commented 4 years ago

First of all, thank you for releasing your code and your great work. I have a short question regarding your MotionNet, as I am trying to reproduce your numbers. When I run the pre-processing script over the NuScenes folder everything seems to work fine and the output looks also good, with a rough training dataset size of 19GB. You reported in your data/readme.md a total preprocessed training dataset size of 26,5 GB on your system. Is this difference realistic? Also, when I start the MGDA with ST consistency loss as shown in the readme.md, the first warning I get is "The size of training dataset is not 17065" and shortly after I am told that my Training dataset size is instead 6951. So a lot of numbers do not add up for me here (if you have 17k samples in 26GB and I less than half of samples still in 19GB, and also where are the missing 10k samples). Maybe you can help me out here or have an idea of what is different?

Could you provide your command line when running the code? Let me check what might cause this inconsistency.

Everything I did was done in a venv with python3.6.9 and the required pip-dependencies on an Ubuntu 18.04 system. The command line I ran was directly taken from the readme.md where I just replaced my used directories:

python $SRC_DIR/data/gen_data.py --root $INPUT_DATADIR/nuscenes --split train --savepath $INPUT_DATADIR/nuscenes_preprocessed/train

The starting output looks like the following:

======                                                                                                                                                                                                             
Loading NuScenes tables for version v1.0-trainval...                                                                                                                                                               
23 category,                                                                                                                                                                                                       
8 attribute,                                                                                                                                                                                                       
4 visibility,                                                                                                                                                                                                      
64386 instance,                                                                                                                                                                                                    
12 sensor,                                                                                                                                                                                            
10200 calibrated_sensor,                                                                                                                                                                                           
2631083 ego_pose,                                                                                                                                                                                   
68 log,                                                                                                                                                                                                      
850 scene,                                                                                                                                                                                                        
34149 sample,                                                                                                                                                                                                      
2631083 sample_data,                                                                                                                                                                                       
1166187 sample_annotation,                                                                                                                                                                                         
4 map,                                                                                                                                                                                                
Done loading in 36.6 seconds.                                                                                                                                                                                      
======
Reverse indexing ...
Done reverse indexing in 9.8 seconds.
======
Total number of scenes: 850
Split: train, which contains 500 scenes.
Processing scene 411 ...
  >> Finish sample: 0, sequence 0

When I now start a training with MGDA and ST consistency loss like described in the readme.md:

python train_multi_seq_MGDA.py --data $INPUT_DATADIR/nuscenes_preprocessed/train --batch 8 --nepoch 70 --nworker 4 --use_bg_tc --reg_weight_bg_tc 0.1 --use_
fg_tc --reg_weight_fg_tc 2.5 --use_sc --reg_weight_sc 15.0 --reg_weight_cls 2.0 --log

I get the following output:

Namespace(batch=8, data='/xxxINPUT_DATADIRxxx(postedited for this issue)/nuscenes_preprocessed/train', log=True, logpath='', nepoch=70, nn_sampling=False, nworker=4, reg_weight_bg_tc=0.1, reg_weight_cls=2.0, reg_weight_fg_tc=2
.5, reg_weight_sc=15.0, resume='', use_bg_tc=True, use_fg_tc=True, use_sc=True)                                                                                                                                    
device number 2                                                                                                                                                                                                    
data root: /xxxINPUT_DATADIRxxx/nuscenes_preprocessed/train                                                                                                                                            
/xxxSRC_DIRxxx/data/nuscenes_dataloader.py:40: UserWarning: >> The size of training dataset is not 17065.                                                                                     

  warnings.warn(">> The size of training dataset is not 17065.\n")                                                                                                                                                 
Training dataset size: 6951                                                                                                                                                                                        
Epoch 1, learning rate 0.002                                                                                                                                                                                       
[1/0]   Disp 0.106501,  Obj_Cls 0.110858,       Motion_Cls 0.057613,    bg_tc 0.8646359,        sc 0.0885072,   fg_tc 0.0126457
.
.
.

So as you can see, there is no real problem with the preprocessing and the start of the training, however having 10k samples missing compared to the publicated results makes the reproduction of the results impossible.

Also after some time the training actually fails, but I cannot tell if it is related to this issue (I am not a pickle expert):

.
.
.
[1/482] Disp 0.035911,  Obj_Cls 0.068191,       Motion_Cls 0.014474,    bg_tc 0.0069933,        sc 0.0002731,   fg_tc 0.0000397
Traceback (most recent call last):
  File "train_multi_seq_MGDA.py", line 1042, in <module>
    main()
  File "train_multi_seq_MGDA.py", line 269, in main
    models, criterion, trainloader, optimizers, device, epoch
  File "train_multi_seq_MGDA.py", line 321, in train
    for i, data in enumerate(trainloader, 0):
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
_pickle.UnpicklingError: Traceback (most recent call last):
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/xxxSRC_DIRxxx/data/nuscenes_dataloader.py", line 68, in __getitem__
    gt_data_handle = np.load(gt_file_path, allow_pickle=True)
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/numpy/lib/npyio.py", line 440, in load
    pickle_kwargs=pickle_kwargs)
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/numpy/lib/format.py", line 732, in read_array
    array = pickle.load(fp, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x00'.

pxiangwu commented 4 years ago

Hi, let me check the code. I may need to run the code to see what happens. Please give me a little bit time.

Also, below are the links to the pre-trained models. which might be helpful for you.

The pre-trained model for train_multi_seq.py can be downloaded from Google Drive or Dropbox
The pre-trained model for train_multi_seq_MGDA.py can be downloaded from Google Drive or Dropbox

pxiangwu commented 4 years ago

@DavidS3141 , after quickly running the code, the output told me that for the scene 411 it generates 34 files, for scene 662 it generates 34 files, and for scene 2 it also generates 34 files, etc. So roughly in total we have 34 * 500 = 17000 files (close to the 17065). So I think the code is correct.

Could you run the code on your system to check how many files scene 411, 662, 2 would generate? (These 3 scenes are the first 3 scenes that will be processed by the code).

And you could comment out some of the proprocessing code, such as BEV rasterization and file saving, etc to accelerate the code running. In this way we may quickly check the total number of files it would dump (see the code below).

# COPYRIGHT (C) Mitsubishi Electric Research Labs (MERL) 2020
# Code written by Pengxiang Wu
# March 2020

from nuscenes.nuscenes import NuScenes
import os
from nuscenes.utils.data_classes import LidarPointCloud
import numpy as np
import argparse
from data.data_utils import voxelize_occupy, gen_2d_grid_gt

parser = argparse.ArgumentParser()
parser.add_argument('-r', '--root', default=None, type=str, help='Root path to nuScenes dataset')
parser.add_argument('-s', '--split', default='train', type=str, help='The data split [train/val/test]')
parser.add_argument('-p', '--savepath', default=None, type=str, help='Directory for saving the generated data')
args = parser.parse_args()

if args.root is None or args.savepath is None:
    raise ValueError("Should specify the dataset path and the savepath.")

nusc = NuScenes(version='v1.0-trainval', dataroot=args.root, verbose=True)
print("Total number of scenes:", len(nusc.scene))

class_map = {'vehicle.car': 1, 'vehicle.bus.rigid': 1, 'vehicle.bus.bendy': 1, 'human.pedestrian': 2,
             'vehicle.bicycle': 3}  # background: 0, other: 4

if args.split == 'train':
    num_keyframe_skipped = 0  # The number of keyframes we will skip when dumping the data
    nsweeps_back = 30  # Number of frames back to the history (including the current timestamp)
    nsweeps_forward = 20  # Number of frames into the future (does not include the current timestamp)
    skip_frame = 0  # The number of frames skipped for the adjacent sequence
    num_adj_seqs = 2  # number of adjacent sequences, among which the time gap is \delta t
else:
    num_keyframe_skipped = 1
    nsweeps_back = 25  # Setting this to 30 (for training) or 25 (for testing) allows conducting ablation studies on frame numbers
    nsweeps_forward = 20
    skip_frame = 0
    num_adj_seqs = 1

# The specifications for BEV maps
voxel_size = (0.25, 0.25, 0.4)
area_extents = np.array([[-32., 32.], [-32., 32.], [-3., 2.]])
past_frame_skip = 3  # when generating the BEV maps, how many history frames need to be skipped
future_frame_skip = 0  # when generating the BEV maps, how many future frames need to be skipped
num_past_frames_for_bev_seq = 5  # the number of past frames for BEV map sequence

scenes = np.load('data/split.npy', allow_pickle=True).item().get(args.split)
print("Split: {}, which contains {} scenes.".format(args.split, len(scenes)))

# ---------------------- Extract the scenes, and then pre-process them into BEV maps ----------------------
def gen_data():
    res_scenes = list()
    for s in scenes:
        s_id = s.split('_')[1]
        res_scenes.append(int(s_id))

    total = 0
    for scene_idx in res_scenes:
        curr_scene = nusc.scene[scene_idx]

        first_sample_token = curr_scene['first_sample_token']
        curr_sample = nusc.get('sample', first_sample_token)
        curr_sample_data = nusc.get('sample_data', curr_sample['data']['LIDAR_TOP'])

        save_data_dict_list = list()  # for storing consecutive sequences; the data consists of timestamps, points, etc
        save_box_dict_list = list()  # for storing box annotations in consecutive sequences
        save_instance_token_list = list()
        adj_seq_cnt = 0
        save_seq_cnt = 0  # only used for save data file name

        # Iterate each sample data
        print("Processing scene {} ...".format(scene_idx))
        while curr_sample_data['next'] != '':

            # Get the synchronized point clouds
            all_pc, all_times, trans_matrices = \
                LidarPointCloud.from_file_multisweep_bf_sample_data(nusc, curr_sample_data,
                                                                    return_trans_matrix=True,
                                                                    nsweeps_back=nsweeps_back,
                                                                    nsweeps_forward=nsweeps_forward)
            # Store point cloud of each sweep
            pc = all_pc.points
            _, sort_idx = np.unique(all_times, return_index=True)
            unique_times = all_times[np.sort(sort_idx)]  # Preserve the item order in unique_times
            num_sweeps = len(unique_times)

            # Make sure we have sufficient past and future sweeps
            if num_sweeps != (nsweeps_back + nsweeps_forward):

                # Skip some keyframes if necessary
                flag = False
                for _ in range(num_keyframe_skipped + 1):
                    if curr_sample['next'] != '':
                        curr_sample = nusc.get('sample', curr_sample['next'])
                    else:
                        flag = True
                        break

                if flag:  # No more keyframes
                    break
                else:
                    curr_sample_data = nusc.get('sample_data', curr_sample['data']['LIDAR_TOP'])

                # Reset
                adj_seq_cnt = 0
                save_data_dict_list = list()
                save_box_dict_list = list()
                save_instance_token_list = list()
                continue

            adj_seq_cnt += 1
            if adj_seq_cnt == num_adj_seqs:

                print(">> Finish sample Num: {}".format(total + 1))
                total += 1
                # --------------------------------------------------------------------------------

                save_seq_cnt += 1
                adj_seq_cnt = 0
                save_data_dict_list = list()
                save_box_dict_list = list()
                save_instance_token_list = list()

                # Skip some keyframes if necessary
                flag = False
                for _ in range(num_keyframe_skipped + 1):
                    if curr_sample['next'] != '':
                        curr_sample = nusc.get('sample', curr_sample['next'])
                    else:
                        flag = True
                        break

                if flag:  # No more keyframes
                    break
                else:
                    curr_sample_data = nusc.get('sample_data', curr_sample['data']['LIDAR_TOP'])
            else:
                flag = False
                for _ in range(skip_frame + 1):
                    if curr_sample_data['next'] != '':
                        curr_sample_data = nusc.get('sample_data', curr_sample_data['next'])
                    else:
                        flag = True
                        break

                if flag:  # No more sample frames
                    break

# ---------------------- Convert the raw data into (dense) BEV maps ----------------------
def convert_to_dense_bev(data_dict):
    num_sweeps = data_dict['num_sweeps']
    times = data_dict['times']
    trans_matrices = data_dict['trans_matrices']

    num_past_sweeps = len(np.where(times >= 0)[0])
    num_future_sweeps = len(np.where(times < 0)[0])
    assert num_past_sweeps + num_future_sweeps == num_sweeps, "The number of sweeps is incorrect!"

    # Load point cloud
    pc_list = []

    for i in range(num_sweeps):
        pc = data_dict['pc_' + str(i)]
        pc_list.append(pc.T)

    # Reorder the pc, and skip sample frames if wanted
    # Currently the past frames in pc_list are stored in the following order [current, current + 1, current + 2, ...]
    # Therefore, we would like to reorder the frames
    tmp_pc_list_1 = pc_list[0:num_past_sweeps:(past_frame_skip + 1)]
    tmp_pc_list_1 = tmp_pc_list_1[::-1]
    tmp_pc_list_2 = pc_list[(num_past_sweeps + future_frame_skip)::(future_frame_skip + 1)]
    pc_list = tmp_pc_list_1 + tmp_pc_list_2  # now the order is: [past frames -> current frame -> future frames]

    num_past_pcs = len(tmp_pc_list_1)
    num_future_pcs = len(tmp_pc_list_2)

    # Discretize the input point clouds, and compute the ground-truth displacement vectors
    # The following two variables contain the information for the
    # compact representation of binary voxels, as described in the paper
    voxel_indices_list = list()
    padded_voxel_points_list = list()

    past_pcs_idx = list(range(num_past_pcs))
    past_pcs_idx = past_pcs_idx[-num_past_frames_for_bev_seq:]  # we typically use 5 past frames (including the current one)
    for i in past_pcs_idx:
        res, voxel_indices = voxelize_occupy(pc_list[i], voxel_size=voxel_size, extents=area_extents, return_indices=True)
        voxel_indices_list.append(voxel_indices)
        padded_voxel_points_list.append(res)

    # Compile the batch of voxels, so that they can be fed into the network.
    # Note that, the padded_voxel_points in this script will only be used for sanity check.
    padded_voxel_points = np.stack(padded_voxel_points_list, axis=0).astype(np.bool)

    # Finally, generate the ground-truth displacement field
    # - all_disp_field_gt: the ground-truth displacement vectors for each grid cell
    # - all_valid_pixel_maps: the masking map for valid pixels, used for loss computation
    # - non_empty_map: the mask which represents the non-empty grid cells, used for loss computation
    # - pixel_cat_map: the map specifying the category for each non-empty grid cell
    # - pixel_indices: the indices of non-empty grid cells, used to generate sparse BEV maps
    # - pixel_instance_map: the map specifying the instance id for each grid cell, used for loss computation
    all_disp_field_gt, all_valid_pixel_maps, non_empty_map, pixel_cat_map, pixel_indices, pixel_instance_map \
        = gen_2d_grid_gt(data_dict, grid_size=voxel_size[0:2], extents=area_extents,
                         frame_skip=future_frame_skip, return_instance_map=True)

    return voxel_indices_list, padded_voxel_points, pixel_indices, pixel_instance_map, all_disp_field_gt,\
        all_valid_pixel_maps, non_empty_map, pixel_cat_map, num_past_frames_for_bev_seq, num_future_pcs, trans_matrices

# ---------------------- Convert the dense BEV data into sparse format ----------------------
# This will significantly reduce the space used for data storage
def convert_to_sparse_bev(dense_bev_data):
    save_voxel_indices_list, save_voxel_points, save_pixel_indices, save_pixel_instance_maps, \
        save_disp_field_gt, save_valid_pixel_maps, save_non_empty_maps, save_pixel_cat_maps, \
        save_num_past_pcs, save_num_future_pcs, save_trans_matrices = dense_bev_data

    save_valid_pixel_maps = save_valid_pixel_maps.astype(np.bool)
    save_voxel_dims = save_voxel_points.shape[1:]
    num_categories = save_pixel_cat_maps.shape[-1]

    sparse_disp_field_gt = save_disp_field_gt[:, save_pixel_indices[:, 0], save_pixel_indices[:, 1], :]
    sparse_valid_pixel_maps = save_valid_pixel_maps[:, save_pixel_indices[:, 0], save_pixel_indices[:, 1]]
    sparse_pixel_cat_maps = save_pixel_cat_maps[save_pixel_indices[:, 0], save_pixel_indices[:, 1]]
    sparse_pixel_instance_maps = save_pixel_instance_maps[save_pixel_indices[:, 0], save_pixel_indices[:, 1]]

    save_data_dict = dict()
    for i in range(len(save_voxel_indices_list)):
        save_data_dict['voxel_indices_' + str(i)] = save_voxel_indices_list[i].astype(np.int32)

    save_data_dict['disp_field'] = sparse_disp_field_gt
    save_data_dict['valid_pixel_map'] = sparse_valid_pixel_maps
    save_data_dict['pixel_cat_map'] = sparse_pixel_cat_maps
    save_data_dict['num_past_pcs'] = save_num_past_pcs
    save_data_dict['num_future_pcs'] = save_num_future_pcs
    save_data_dict['trans_matrices'] = save_trans_matrices
    save_data_dict['3d_dimension'] = save_voxel_dims
    save_data_dict['pixel_indices'] = save_pixel_indices
    save_data_dict['pixel_instance_ids'] = sparse_pixel_instance_maps

    # -------------------------------- Sanity Check --------------------------------
    dims = save_non_empty_maps.shape

    test_disp_field_gt = np.zeros((save_num_future_pcs, dims[0], dims[1], 2), dtype=np.float32)
    test_disp_field_gt[:, save_pixel_indices[:, 0], save_pixel_indices[:, 1], :] = sparse_disp_field_gt[:]
    assert np.all(test_disp_field_gt == save_disp_field_gt), "Error: Mismatch"

    test_valid_pixel_maps = np.zeros((save_num_future_pcs, dims[0], dims[1]), dtype=np.bool)
    test_valid_pixel_maps[:, save_pixel_indices[:, 0], save_pixel_indices[:, 1]] = sparse_valid_pixel_maps[:]
    assert np.all(test_valid_pixel_maps == save_valid_pixel_maps), "Error: Mismatch"

    test_pixel_cat_maps = np.zeros((dims[0], dims[1], num_categories), dtype=np.float32)
    test_pixel_cat_maps[save_pixel_indices[:, 0], save_pixel_indices[:, 1], :] = sparse_pixel_cat_maps[:]
    assert np.all(test_pixel_cat_maps == save_pixel_cat_maps), "Error: Mismatch"

    test_non_empty_map = np.zeros((dims[0], dims[1]), dtype=np.float32)
    test_non_empty_map[save_pixel_indices[:, 0], save_pixel_indices[:, 1]] = 1.0
    assert np.all(test_non_empty_map == save_non_empty_maps), "Error: Mismatch"

    test_pixel_instance_map = np.zeros((dims[0], dims[1]), dtype=np.uint8)
    test_pixel_instance_map[save_pixel_indices[:, 0], save_pixel_indices[:, 1]] = sparse_pixel_instance_maps[:]
    assert np.all(test_pixel_instance_map == save_pixel_instance_maps), "Error: Mismatch"

    for i in range(len(save_voxel_indices_list)):
        indices = save_data_dict['voxel_indices_' + str(i)]
        curr_voxels = np.zeros(save_voxel_dims, dtype=np.bool)
        curr_voxels[indices[:, 0], indices[:, 1], indices[:, 2]] = 1
        assert np.all(curr_voxels == save_voxel_points[i]), "Error: Mismatch"

    return save_data_dict

if __name__ == "__main__":
    gen_data()

demmerichs commented 3 years ago

When I run my version of the script I got the following, which looks a bit different then yours because of your changes:

Processing scene 411 ...                                                                                                                                                                                           
  >> Finish sample: 0, sequence 0                                                                                                                                                                                  
  >> Finish sample: 0, sequence 1                                                                                                                                                                                  
  >> Finish sample: 1, sequence 0                                                                                                                                                                                  
  >> Finish sample: 1, sequence 1                                                                                                                                                                                  
  >> Finish sample: 2, sequence 0                                                                                                                                                                                  
  >> Finish sample: 2, sequence 1                                                                                                                                                                                  
  >> Finish sample: 3, sequence 0                                                                                                                                                                                  
  >> Finish sample: 3, sequence 1                                                                                                                                                                                  
  >> Finish sample: 4, sequence 0                                                                                                                                                                                  
  >> Finish sample: 4, sequence 1                                                                                                                                                                                  
  >> Finish sample: 5, sequence 0                                                                                                                                                                                  
  >> Finish sample: 5, sequence 1                                                                                                                                                                                  
  >> Finish sample: 6, sequence 0                                                                                                                                                                                  
  >> Finish sample: 6, sequence 1                                                                                                                                                                                  
  >> Finish sample: 7, sequence 0                                                                                                                                                                                  
  >> Finish sample: 7, sequence 1                                                                                                                                                                                  
  >> Finish sample: 8, sequence 0                                                                                                                                                                                  
  >> Finish sample: 8, sequence 1                                                                                                                                                                          
  >> Finish sample: 9, sequence 0                                                                                                                                                                                 
  >> Finish sample: 9, sequence 1                                                                                                                                                                                  
  >> Finish sample: 10, sequence 0                                                                                                                                                                                 
  >> Finish sample: 10, sequence 1                                                                                                                                                                                
  >> Finish sample: 11, sequence 0                                                                                                                                                                                 
  >> Finish sample: 11, sequence 1                                                                                                                                                                                 
  >> Finish sample: 12, sequence 0                                                                                                                                                                                 
  >> Finish sample: 12, sequence 1                                                                                                                                                                                 
  >> Finish sample: 13, sequence 0                                                                                                                                                                                 
  >> Finish sample: 13, sequence 1                                                                                                                                                                                 
  >> Finish sample: 14, sequence 0                                                                                                                                                                                 
  >> Finish sample: 14, sequence 1                                                                                                                                                                                 
  >> Finish sample: 15, sequence 0                                                                                                                                                                                 
  >> Finish sample: 15, sequence 1                                                                                                                                                                                 
  >> Finish sample: 16, sequence 0                                                                                                                                                                                 
  >> Finish sample: 16, sequence 1                                                                                                                                                                                 
  >> Finish sample: 17, sequence 0                                                                                                                                                                                 
  >> Finish sample: 17, sequence 1                                                                                                                                                                                 
  >> Finish sample: 18, sequence 0                                                                                                                                                                                 
  >> Finish sample: 18, sequence 1                                                                                                                                                                                 
  >> Finish sample: 19, sequence 0                                                                                                                                                                                 
  >> Finish sample: 19, sequence 1                                                                                                                                                                                 
  >> Finish sample: 20, sequence 0                                                                                                                                                                                 
  >> Finish sample: 20, sequence 1                                                                                                                                                                                 
  >> Finish sample: 21, sequence 0                                                                                                                                                                                 
  >> Finish sample: 21, sequence 1                                                                                                                                                                                 
  >> Finish sample: 22, sequence 0                                                                                                                                                                                 
  >> Finish sample: 22, sequence 1                                                                                                                                                                                 
  >> Finish sample: 23, sequence 0                                                                                                                                                                                 
  >> Finish sample: 23, sequence 1                                                                                                                                                                                 
  >> Finish sample: 24, sequence 0
  >> Finish sample: 24, sequence 1
  >> Finish sample: 25, sequence 0
  >> Finish sample: 25, sequence 1
  >> Finish sample: 26, sequence 0
  >> Finish sample: 26, sequence 1
  >> Finish sample: 27, sequence 0
  >> Finish sample: 27, sequence 1
  >> Finish sample: 28, sequence 0
  >> Finish sample: 28, sequence 1
  >> Finish sample: 29, sequence 0
  >> Finish sample: 29, sequence 1
  >> Finish sample: 30, sequence 0
  >> Finish sample: 30, sequence 1
  >> Finish sample: 31, sequence 0
  >> Finish sample: 31, sequence 1
  >> Finish sample: 32, sequence 0
  >> Finish sample: 32, sequence 1
  >> Finish sample: 33, sequence 0
  >> Finish sample: 33, sequence 1
Processing scene 662 ...

The first four scenes all have 34 samples, and are 411, 662, 225, 2 in that order (I think you just missed 225, because also your script provided here gives this order of scenes). Sadly for me the loading is also quite slow, but I am running your shortened script to find out the total count of samples. But right now everything points to some error happening during the preprocessing which I missed and which resulted in the drop of scenes. I just realized "again" that nuscenes is quite memory hungry and I also had some other applications running and started automatically also val and test preprocessing, so maybe the generation script was killed because of OOM. Right now stopped other programs and will update this as soon as the scripts have finished (might take a day :/, tqdm would have been nice for this).

pxiangwu commented 3 years ago

Yes. The data loading is slow since the data reader of nuScenes is implemented in Python instead of C++.

Let me know if you successfully generate 17065 files.

demmerichs commented 3 years ago

It was a problem on my side, thanks for your help again. I now processed all scenes as expected. Closing this.

pxiangwu / MotionNet

Training Set size not 17065 for NuScenes after preprocessing #5