qfgaohao / pytorch-ssd

MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.
https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad
MIT License
1.39k stars 529 forks source link

ValueError encountered during retraining on Open Images Dataset #193

Open Kweon0605 opened 6 months ago

Kweon0605 commented 6 months ago

"!python train_ssd.py --dataset_type open_images --datasets ~/data/open_images --net mb1-ssd --pretrained_ssd models/mobilenet-v1-ssd-mp-0_675.pth --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 5 --num_epochs 100 --base_net_lr 0.001 --batch_size 5" In colab, This is the code I'm using to train the model, but I'm encountering an error. The Error massage is "2024-02-21 01:57:21,177 - root - INFO - Namespace(dataset_type='open_images', datasets=['/root/data/open_images'], validation_dataset=None, balance_data=False, net='mb1-ssd', freeze_base_net=False, freeze_net=False, mb2_width_mult=1.0, lr=0.01, momentum=0.9, weight_decay=0.0005, gamma=0.1, base_net_lr=0.001, extra_layers_lr=None, base_net=None, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', milestones='80,100', t_max=100.0, batch_size=5, num_epochs=100, num_workers=4, validation_epochs=5, debug_steps=100, use_cuda=True, checkpoint_folder='models/') 2024-02-21 01:57:21,178 - root - INFO - Prepare training datasets. 2024-02-21 01:57:22,170 - root - INFO - Dataset Summary:Number of Images: 961 Minimum Number of Images for a Class: -1 Label Distribution: Handgun: 727 Shotgun: 580 2024-02-21 01:57:22,172 - root - INFO - Stored labels into file models/open-images-model-labels.txt. 2024-02-21 01:57:22,172 - root - INFO - Train dataset size: 961 /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( 2024-02-21 01:57:22,175 - root - INFO - Prepare Validation datasets. 2024-02-21 01:57:22,256 - root - INFO - Dataset Summary:Number of Images: 123 Minimum Number of Images for a Class: -1 Label Distribution: Handgun: 81 Shotgun: 66 2024-02-21 01:57:22,256 - root - INFO - validation dataset size: 123 /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( 2024-02-21 01:57:22,256 - root - INFO - Build network. 2024-02-21 01:57:22,350 - root - INFO - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth 2024-02-21 01:57:25,194 - root - INFO - Took 2.84 seconds to load the model. 2024-02-21 01:57:25,197 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01. 2024-02-21 01:57:25,197 - root - INFO - Uses CosineAnnealingLR scheduler. 2024-02-21 01:57:25,198 - root - INFO - Start training from epoch 0. /usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " Traceback (most recent call last): File "/content/drive/MyDrive/try_ssd/pytorch-ssd/train_ssd.py", line 325, in train(train_loader, net, criterion, optimizer, File "/content/drive/MyDrive/try_ssd/pytorch-ssd/train_ssd.py", line 116, in train for i, data in enumerate(loader): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 694, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 302, in getitem return self.datasets[dataset_idx][sample_idx] File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/datasets/openimages.py", line 44, in getitem , image, boxes, labels = self._getitem(index) File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/datasets/open_images.py", line 38, in _getitem image, boxes, labels = self.transform(image, boxes, labels) File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/ssd/data_preprocessing.py", line 34, in call return self.augment(img, boxes, labels) File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/transforms/transforms.py", line 55, in call img, boxes, labels = t(img, boxes, labels) File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/transforms/transforms.py", line 247, in call mode = random.choice(self.sample_options) File "mtrand.pyx", line 936, in numpy.random.mtrand.RandomState.choice ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. "

How can I fix it?

zn845639326 commented 6 months ago

Try to downgrade numpy version, just like this: pip install numpy==1.22.0 Maybe other version should also work.