zhanwenchen / relaug

MIT License
0 stars 0 forks source link

AssertionError in assert rois.size(0) > 0 #42

Open zhanwenchen opened 1 year ago

zhanwenchen commented 1 year ago

lab2: gsc 20230226220526_vctree_semantic_sgcls_4GPU_lab1_1e3 also in lab3: 20230226220656_vctree_semantic_predcls_4GPU_lab1_1e3/

no rels_new for rel_og=has
no rels_new for rel_og=with
no rels_new for rel_og=in front of
no rels_new for rel_og=in front of
3209: Augmentation: 1 => 1
3209: Augmentation: 1 => 1
no rels_new for rel_og=on
no rels_new for rel_og=on
no rels_new for rel_og=on
no rels_new for rel_og=with
no rels_new for rel_og=on
no rels_new for rel_og=on
3209: Augmentation: 1 => 1
Traceback (most recent call last):
  File "/home/pct4et/gsc/tools/relation_train_net.py", line 665, in <module>
    main()
  File "/home/pct4et/gsc/tools/relation_train_net.py", line 650, in main
    train(cfg, local_rank, args.distributed, logger, experiment)
  File "/home/pct4et/gsc/tools/relation_train_net.py", line 327, in train
    loss_dict = model(images, targets)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pct4et/gsc/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 76, in forward
    _, result, detector_losses = self.roi_heads(features, proposals, targets, logger, boxes_global=boxes_global)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pct4et/gsc/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 69, in forward
    x, detections, loss_relation = self.relation(features, detections, targets, logger, boxes_global=boxes_global)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pct4et/gsc/maskrcnn_benchmark/modeling/roi_heads/relation_head/relation_head.py", line 99, in forward
    union_features = self.union_feature_extractor(features, proposals, rel_pair_idxs)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pct4et/gsc/maskrcnn_benchmark/modeling/roi_heads/relation_head/roi_relation_feature_extractors.py", line 99, in forward
    union_vis_features = self.feature_extractor.pooler(x, union_proposals) # union_proposals: 16 * [651..., 650..., 110...] # union_vis_features: torch.Size([5049, 256, 7, 7]) # TODO: need to borrow pooler's 5 layers to 1 reduction. so have a global union feature pooler
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pct4et/gsc/maskrcnn_benchmark/modeling/poolers.py", line 142, in forward
    assert rois.size(0) > 0
AssertionError
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 64810 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 64811 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 64812 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 3 (pid: 64813) of binary: /home/pct4et/miniconda3/envs/gsc/bin/python
Traceback (most recent call last):
  File "/home/pct4et/miniconda3/envs/gsc/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/pct4et/miniconda3/envs/gsc/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================