megvii-research / video_analyst

A series of basic algorithms that are useful for video understanding, including Single Object Tracking (SOT), Video Object Segmentation (VOS) and so on.
MIT License
832 stars 176 forks source link

raise StopIteration #124

Closed jianjiandandande closed 3 years ago

jianjiandandande commented 4 years ago

在训练的过程中,有这样的问题:

2020-07-19 19:56:00.637 | INFO     | videoanalyst.engine.monitor.monitor_impl.tensorboard_logger:update:75 - Tensorboard writer built, starts recording from global_step=0
2020-07-19 19:56:00.637 | INFO     | videoanalyst.engine.monitor.monitor_impl.tensorboard_logger:update:78 - epoch=0, max_epoch=20, iteration=0, max_iteration=2343
epoch 0, lr: 8.0e-02, cls: 0.106, ctr: 0.037, reg: 1.119, iou: 0.699, data: 1.7e-05, fwd: 1.7e-01, bwd: 9.1e-02, optim: 1.5e-01, : 100%|████████████████████████████████| 2343/2343 [34:56<00:00,  1.12it/s]
2020-07-19 20:29:51.807 | INFO     | videoanalyst.engine.trainer.trainer_base:save_snapshot:143 - Snapshot saved at: /home/ubuntu/Vincent/object_track/video_analyst-master/snapshots/siamfcpp_googlenet-lasot/epoch-0.pkl
  0%|                                                                                                                                                                              | 0/2343 [00:00<?, ?it/s]Traceback (most recent call last):
  File "./main/train.py", line 110, in <module>
    trainer.train()
  File "/home/ubuntu/Vincent/object_track/video_analyst-master/videoanalyst/engine/trainer/trainer_impl/regular_trainer.py", line 97, in train
    training_data = next(self._dataloader)
  File "/home/ubuntu/anaconda3/envs/vincent/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ubuntu/anaconda3/envs/vincent/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 831, in _next_data
    raise StopIteration
StopIteration
  0%|                                                                                                                                                                              | 0/2343 [00:00<?, ?it/s]
(vincent) ubuntu@ubun:~/Vincent/object_track/video_analyst-master$ python ./main/train.py --config 'experiments/siamfcpp/train/lasot/siamfcpp_googlenet-trn.yaml'
MARMOTatZJU commented 4 years ago

@jianjiandandande Have you changed any part of the code/configuration yaml file?

jianjiandandande commented 4 years ago

@jianjiandandande Have you changed any part of the code/configuration yaml file?

Just to mend the epoch

MARMOTatZJU commented 4 years ago

@jianjiandandande It would be better if you can provide the output of a simple "git diff".

jianjiandandande commented 4 years ago

I've only changed this part in video_analyst-master/experiments/siamfcpp/train/lasot/siamfcpp_googlenet-trn.yaml and It basically only reported an error in the last epoch,so I changed 20 to 21

# ==================================================
    data:
      exp_name: *TRAIN_NAME
      exp_save: *TRAIN_SAVE
      num_epochs: 21 # 20 
      minibatch: &MINIBATCH 64  # 256
      num_workers: 32
      nr_image_per_epoch: &NR_IMAGE_PER_EPOCH 150000
      pin_memory: false
      datapipeline:
        name: "RegularDatapipeline"
      sampler:
        name: "TrackPairSampler"
        TrackPairSampler:
          negative_pair_ratio: 0.33
        submodules:
          dataset:
            names: ["LaSOTDataset",]  # (GOT10kDataset|LaSOTDataset)
            GOT10kDataset: &GOT10KDATASET_CFG
              ratio: 1.0
              max_diff: 100
              dataset_root: "datasets/GOT-10k"
              subset: "train"
            GOT10kDatasetFixed: *GOT10KDATASET_CFG  # got10k dataset with exclusion of unfixed sequences
            LaSOTDataset:
              ratio: 1.0
              max_diff: 100
              dataset_root: "datasets/LaSOT"
              subset: "train"
          filter:
            name: "TrackPairFilter"
            TrackPairFilter:
              max_area_rate: 0.6
              min_area_rate: 0.001
              max_ratio: 10
      transformer:
        names: ["RandomCropTransformer", ]
        RandomCropTransformer:
          max_scale: 0.3
          max_shift: 0.4
          x_size: 289
      target:
        name: "DenseboxTarget"
        DenseboxTarget:
          total_stride: 8
          score_size: 17
          x_size: 289
          num_conv3x3: 2
    trainer:
      name: "RegularTrainer"
      RegularTrainer:
        exp_name: *TRAIN_NAME
        exp_save: *TRAIN_SAVE
        max_epoch: 21  # 20
        minibatch: *MINIBATCH
        nr_image_per_epoch: *NR_IMAGE_PER_EPOCH
        snapshot: ""
      monitors:
        names: ["TextInfo", "TensorboardLogger"]
        TextInfo:
          {}
        TensorboardLogger:
          exp_name: *TRAIN_NAME
          exp_save: *TRAIN_SAVE

          # ==================================================
MARMOTatZJU commented 4 years ago

@jianjiandandande Inteed, there are some flaws in config and lacks of clarification. I have made this PR to fix it. You may checkout the file changes to fix you own branch, or just checkout my fork

jianjiandandande commented 4 years ago

ok ,thanks @MARMOTatZJU

MARMOTatZJU commented 3 years ago

@jianjiandandande Issue closed as it's been a long time without further reply. Wish that you've already fixed your issue. Feel free to reopen for further help.