zhanwenchen / relaug

MIT License
0 stars 0 forks source link

Graft Statistics - IndexError: invalid index to scalar variable. #91

Closed zhanwenchen closed 1 year ago

zhanwenchen commented 1 year ago
VGDataset: use_graft=True
GraftAugmenterDataset.__init__: started running get_dataset_statistics
  0%|          | 1/57723 [00:00<19:17, 49.87it/s]
Traceback (most recent call last):
  File "/localtmp/pct4et/relaug/tools/relation_train_net.py", line 394, in <module>
    main()
  File "/localtmp/pct4et/relaug/tools/relation_train_net.py", line 390, in main
    train(cfg, local_rank, args.distributed, logger)
  File "/localtmp/pct4et/relaug/tools/relation_train_net.py", line 48, in train
    train_data_loader = make_data_loader(
  File "/localtmp/pct4et/relaug/maskrcnn_benchmark/data/build.py", line 181, in make_data_loader
    datasets = build_dataset(cfg, dataset_list, transforms, DatasetCatalog, is_train)
  File "/localtmp/pct4et/relaug/maskrcnn_benchmark/data/build.py", line 62, in build_dataset
    datasets.append(GraftAugmenterDataset(dataset))
  File "/localtmp/pct4et/relaug/maskrcnn_benchmark/data/datasets/graft_augmenter.py", line 37, in __init__
    statistics = dataset.get_statistics()
  File "/localtmp/pct4et/relaug/maskrcnn_benchmark/data/datasets/visual_genome.py", line 102, in get_statistics
    fg_matrix, bg_matrix, stats = self.get_VG_statistics(must_overlap=True)
  File "/localtmp/pct4et/relaug/maskrcnn_benchmark/data/datasets/visual_genome.py", line 225, in get_VG_statistics
    o1o2 = gt_classes[o1o2_indices] # Regardless
IndexError: invalid index to scalar variable.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57453 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57455 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 57456 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 57454) of binary: /localtmp/pct4et/conda_envs/relaug/bin/python
Traceback (most recent call last):
  File "/localtmp/pct4et/conda_envs/relaug/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.10.2', 'console_scripts', 'torchrun')())
  File "/localtmp/pct4et/conda_envs/relaug/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/localtmp/pct4et/conda_envs/relaug/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
    run(args)
  File "/localtmp/pct4et/conda_envs/relaug/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
    elastic_launch(
  File "/localtmp/pct4et/conda_envs/relaug/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/localtmp/pct4et/conda_envs/relaug/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
zhanwenchen commented 1 year ago

Under normal circumstances, it should look like this;

(Pdb) o1o2_indices
array([[12, 10],
       [ 1,  0],
       [13,  5],
       [13,  7],
       [11,  8]])
(Pdb) gt_classes[o1o2_indices] # Regardless
array([[ 77, 111],
       [ 20,   3],
       [ 78,  58],
       [ 78,  97],
       [115,  99]])
zhanwenchen commented 1 year ago

Under error cases:

(Pdb) o1o2_indices
array([[3, 6],
       [1, 2]])
(Pdb) gt_classes
20
zhanwenchen commented 1 year ago

The problem was a variable name being the same as the original, causing an overwrite of the gt_classes variable:

gt_classes = gt_classes[ex_ind] # etc