tusen-ai / SST

Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral).
Apache License 2.0
801 stars 102 forks source link

tranning problem #132

Closed 20210726 closed 1 year ago

20210726 commented 1 year ago

at Step 5: Begin training,here is error: 2023-08-15 19:05:25,999 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs INFO:mmdet:workflow: [('train', 1)], max: 24 epochs 2023-08-15 19:05:26.085644: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "tools/train.py", line 230, in main() File "tools/train.py", line 220, in main train_model( File "/waymo/SST/mmdet3d/apis/train.py", line 41, in train_model train_detector( File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/root/ctrl/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/root/ctrl/mmcv/mmcv/runner/epoch_based_runner.py", line 47, in train for i, data_batch in enumerate(self.data_loader): File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/mmdet/datasets/dataset_wrappers.py", line 151, in getitem return self.dataset[idx % self._ori_len] File "/waymo/SST/mmdet3d/datasets/waymo_tracklet_dataset.py", line 284, in getitem data = self.prepare_train_data(idx) File "/waymo/SST/mmdet3d/datasets/waymo_tracklet_dataset.py", line 209, in prepare_train_data input_dict = self.get_data_info(index) File "/waymo/SST/mmdet3d/datasets/waymo_tracklet_dataset.py", line 139, in get_data_info trk.set_type(self.cat2id[trk.type_name], 'mmdet3d') KeyError: 'Pedestrian'

Killing subprocess 2618 Traceback (most recent call last): File "/root/anaconda3/envs/ctrl/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/ctrl/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/root/anaconda3/envs/ctrl/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/root/anaconda3/envs/ctrl/bin/python3', '-u', 'tools/train.py', '--local_rank=0', 'configs/ctrl/ctrl_veh_24e.py', '--launcher', 'pytorch', '--no-validate']' returned non-zero exit status 1.

I followed the CTRL_instructions.md, use part of waymo-dataset, and ignore step2。There should be a configuration file to fix this problem, but I couldn’t find it.

Abyssaledge commented 1 year ago

The key should not be 'Pedestrian' since you use the vehicle config. I need the command and adopted config for checking.

20210726 commented 1 year ago

The key should not be 'Pedestrian' since you use the vehicle config. I need the command and adopted config for checking.

train command is :bash tools/dist_train.sh configs/ctrl/ctrl_veh_24e.py 1 --no-validate

and fsd_base_vehicle.yaml is: image

Abyssaledge commented 1 year ago

Why do you use train_gt.bin in the YAML file?

20210726 commented 1 year ago

Why do you use train_gt.bin in the YAML file?

i think train_gt.bin is detection result in waymo bin format。so i shoud run (Step 2: Use ImmortalTracker to generate tracking results in training split (bin file format)) first,then use bin file generated in step 2 to train model?

Abyssaledge commented 1 year ago

No, train_gt.bin contains the ground-truth information on training set. What you need here is the proposals on training set. So do not change bin path to train_gt.bin, only change the split to training and use 'fsd6f6e_vehicle_full_trainset.bin' if you want to generate training data.

Abyssaledge commented 1 year ago

Please reopen this issue if you need further discussion.

20210726 commented 1 year ago

Here is my step trying to reproduce CTRL,I want to know is there any wrong ? especially step2,and Is the config file ‘fsd_base_vehicle.yaml’ correct?

1.prepare waymo data(I only use part of waymo dataset) 1.1 use my python script to generate train.txt val.txt test.txt and idx2timestamp.pkl idx2contextname.pkl Then cp train.txt val.txt test.txt to ./data/waymo/kitti_format/ImageSets/ cp idx2timestamp.pkl idx2contextname.pkl to ./data/waymo/kitti_format/ 1.2 python tools/create_data.py --dataset waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo image Step 1: Generate train_gt.bin once for all. (waymo bin format). python ./tools/ctrl/generate_train_gt_bin.py generate file 'train_gt.bin' image python ./tools/ctrl/extract_poses.py Generate file context2timestamp.pkl and pose.pkl image

Step 2: Use ImmortalTracker to generate tracking results in training split (bin file format) modify file ego_info.py and time_stamp.py like this: image Modify file waymo_convert_detection.sh like this: image then: bash preparedata/waymo/waymo_preparedata.sh ~/dataset/waymo/waymo_format/ generate files like this : image

bash preparedata/waymo/waymo_convert_detection.sh ~/dataset/waymo/waymo_format/train_gt.bin CTRL_FSD_TTA Generate files like this: In data/waymo/training/detection/CTRL_FSD_TTA/dets: image Modify file run_mot.sh like this: image

Then: bash run_mot.sh generate file like this: image Step 3: Generate track input for training modify file ‘fsd_base_vehicle.yaml’ like this: pred.bin was generated in step 2. image python ./tools/ctrl/generate_track_input.py ./tools/ctrl/data_configs/fsd_base_vehicle.yaml --process 1 generate files like this: image

Step 4: Assign candidates GT tracks python ./tools/ctrl/generate_candidates.py ./tools/ctrl/data_configs/fsd_base_vehicle.yaml --process 1

image Step 5: Begin training bash tools/dist_train.sh configs/ctrl/ctrl_veh_24e.py 1 --no-validate

20210726 commented 1 year ago

@Abyssaledge

JayYangSS commented 1 year ago

step 2, I think you shouldn't use train_gt.bin: bash preparedata/waymo/waymo_convert_detection.sh ~/dataset/waymo/waymo_format/train_gt.bin CTRL_FSD_TTA

you need use base detector to generate prediction result