zhmiao / OpenLongTailRecognition-OLTR

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)
BSD 3-Clause "New" or "Revised" License
839 stars 128 forks source link

Training error for stage 1 #36

Closed daofeng2007 closed 4 years ago

daofeng2007 commented 5 years ago

Loading dataset from: /mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/data/train_256_places365standard {'criterions': {'PerformanceLoss': {'def_file': 'loss/SoftmaxLoss.py', 'loss_params': {}, 'optim_params': None, 'weight': 1.0}}, 'memory': {'centroids': False, 'init_centroids': False}, 'networks': {'classifier': {'def_file': 'models/DotProductClassifier.py', 'optim_params': {'lr': 0.1, 'momentum': 0.9, 'weight_decay': 0.0005}, 'params': {'dataset': 'Places_LT', 'in_dim': 512, 'num_classes': 365, 'stage1_weights': False}}, 'feat_model': {'def_file': 'models/ResNet152Feature.py', 'fix': True, 'optim_params': {'lr': 0.01, 'momentum': 0.9, 'weight_decay': 0.0005}, 'params': {'caffe': True, 'dataset': 'Places_LT', 'dropout': None, 'stage1_weights': False, 'use_fc': True, 'use_modulatedatt': False}}}, 'training_opt': {'batch_size': 256, 'dataset': 'Places_LT', 'display_step': 10, 'feature_dim': 512, 'log_dir': './logs/Places_LT/stage1', 'num_classes': 365, 'num_epochs': 30, 'num_workers': 4, 'open_threshold': 0.1, 'sampler': None, 'scheduler_params': {'gamma': 0.1, 'step_size': 10}}} Loading data from ./data/Places_LT/Places_LT_train.txt Use data transformation: Compose( RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=PIL.Image.BILINEAR) RandomHorizontalFlip(p=0.5) ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0) ToTensor() Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ) No sampler. Shuffle is True. Loading data from ./data/Places_LT/Places_LT_val.txt Use data transformation: Compose( Resize(size=256, interpolation=PIL.Image.BILINEAR) CenterCrop(size=(224, 224)) ToTensor() Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ) No sampler. Shuffle is True. Using 1 GPUs. Loading Scratch ResNet 152 Feature Model. Loading Caffe Pretrained ResNet 152 Weights. Pretrained feature model weights path: ./logs/caffe_resnet152.pth Freezing feature weights except for self attention weights (if exist). Loading Dot Product Classifier. Random initialized classifier weights. Using steps for training. Initializing model optimizer. Loading Softmax Loss. Phase: train Traceback (most recent call last):

File "", line 1, in runfile('/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/main.py', wdir='/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master')

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile execfile(filename, namespace)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/main.py", line 62, in training_model.train()

File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/run_networks.py", line 212, in train phase='train')

File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/run_networks.py", line 137, in batch_forward self.logits, self.direct_memory_feature = self.networks['classifier'](self.features, self.centroids)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward return self.module(*inputs[0], **kwargs[0])

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)

File "models/DotProductClassifier.py", line 11, in forward x = self.fc(x)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 55, in forward return F.linear(input, self.weight, self.bias)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/functional.py", line 1024, in linear return torch.addmm(bias, input, weight.t())

RuntimeError: size mismatch, m1: [256 x 2048], m2: [512 x 365] at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/THC/generic/THCTensorMathBlas.cu:249

zhmiao commented 5 years ago

Hello @daofeng2007 , thank you very much for asking. We have not seen this error on our side before, but we suspect that this might be caused by the version of python. Because we are readon numbers from config files, some times, the order of the numbers can be messed up under different versions of python. Could you please check if the variables are read properly?