raise StopIteration - Githubissues

jianjiandandande commented 4 years ago

在训练的过程中，有这样的问题：

2020-07-19 19:56:00.637 | INFO     | videoanalyst.engine.monitor.monitor_impl.tensorboard_logger:update:75 - Tensorboard writer built, starts recording from global_step=0
2020-07-19 19:56:00.637 | INFO     | videoanalyst.engine.monitor.monitor_impl.tensorboard_logger:update:78 - epoch=0, max_epoch=20, iteration=0, max_iteration=2343
epoch 0, lr: 8.0e-02, cls: 0.106, ctr: 0.037, reg: 1.119, iou: 0.699, data: 1.7e-05, fwd: 1.7e-01, bwd: 9.1e-02, optim: 1.5e-01, : 100%|████████████████████████████████| 2343/2343 [34:56<00:00,  1.12it/s]
2020-07-19 20:29:51.807 | INFO     | videoanalyst.engine.trainer.trainer_base:save_snapshot:143 - Snapshot saved at: /home/ubuntu/Vincent/object_track/video_analyst-master/snapshots/siamfcpp_googlenet-lasot/epoch-0.pkl
  0%|                                                                                                                                                                              | 0/2343 [00:00<?, ?it/s]Traceback (most recent call last):
  File "./main/train.py", line 110, in <module>
    trainer.train()
  File "/home/ubuntu/Vincent/object_track/video_analyst-master/videoanalyst/engine/trainer/trainer_impl/regular_trainer.py", line 97, in train
    training_data = next(self._dataloader)
  File "/home/ubuntu/anaconda3/envs/vincent/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ubuntu/anaconda3/envs/vincent/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 831, in _next_data
    raise StopIteration
StopIteration
  0%|                                                                                                                                                                              | 0/2343 [00:00<?, ?it/s]
(vincent) ubuntu@ubun:~/Vincent/object_track/video_analyst-master$ python ./main/train.py --config 'experiments/siamfcpp/train/lasot/siamfcpp_googlenet-trn.yaml'

MARMOTatZJU commented 4 years ago

@jianjiandandande Have you changed any part of the code/configuration yaml file?

jianjiandandande commented 4 years ago

@jianjiandandande Have you changed any part of the code/configuration yaml file?

Just to mend the epoch

MARMOTatZJU commented 4 years ago

@jianjiandandande It would be better if you can provide the output of a simple "git diff".

jianjiandandande commented 4 years ago

I've only changed this part in video_analyst-master/experiments/siamfcpp/train/lasot/siamfcpp_googlenet-trn.yaml and It basically only reported an error in the last epoch,so I changed 20 to 21

# ==================================================
    data:
      exp_name: *TRAIN_NAME
      exp_save: *TRAIN_SAVE
      num_epochs: 21 # 20 
      minibatch: &MINIBATCH 64  # 256
      num_workers: 32
      nr_image_per_epoch: &NR_IMAGE_PER_EPOCH 150000
      pin_memory: false
      datapipeline:
        name: "RegularDatapipeline"
      sampler:
        name: "TrackPairSampler"
        TrackPairSampler:
          negative_pair_ratio: 0.33
        submodules:
          dataset:
            names: ["LaSOTDataset",]  # (GOT10kDataset|LaSOTDataset)
            GOT10kDataset: &GOT10KDATASET_CFG
              ratio: 1.0
              max_diff: 100
              dataset_root: "datasets/GOT-10k"
              subset: "train"
            GOT10kDatasetFixed: *GOT10KDATASET_CFG  # got10k dataset with exclusion of unfixed sequences
            LaSOTDataset:
              ratio: 1.0
              max_diff: 100
              dataset_root: "datasets/LaSOT"
              subset: "train"
          filter:
            name: "TrackPairFilter"
            TrackPairFilter:
              max_area_rate: 0.6
              min_area_rate: 0.001
              max_ratio: 10
      transformer:
        names: ["RandomCropTransformer", ]
        RandomCropTransformer:
          max_scale: 0.3
          max_shift: 0.4
          x_size: 289
      target:
        name: "DenseboxTarget"
        DenseboxTarget:
          total_stride: 8
          score_size: 17
          x_size: 289
          num_conv3x3: 2
    trainer:
      name: "RegularTrainer"
      RegularTrainer:
        exp_name: *TRAIN_NAME
        exp_save: *TRAIN_SAVE
        max_epoch: 21  # 20
        minibatch: *MINIBATCH
        nr_image_per_epoch: *NR_IMAGE_PER_EPOCH
        snapshot: ""
      monitors:
        names: ["TextInfo", "TensorboardLogger"]
        TextInfo:
          {}
        TensorboardLogger:
          exp_name: *TRAIN_NAME
          exp_save: *TRAIN_SAVE

          # ==================================================

MARMOTatZJU commented 4 years ago

@jianjiandandande Inteed, there are some flaws in config and lacks of clarification. I have made this PR to fix it. You may checkout the file changes to fix you own branch, or just checkout my fork

jianjiandandande commented 4 years ago

ok ,thanks @MARMOTatZJU

MARMOTatZJU commented 3 years ago

@jianjiandandande Issue closed as it's been a long time without further reply. Wish that you've already fixed your issue. Feel free to reopen for further help.

megvii-research / video_analyst

raise StopIteration #124