Errors about image size during validation

VSJMilewski commented 1 year ago

Dear @waknkw,

I tried to run train a model using your codebase. Unfortunately, I run in the following error during evaluation. It seems like some of the images during validation are the wrong size, causing None's when merging outputs from GPYs. I'm just using the configurations as you provided them, so I should be using your generated vg/50/ files.

Do you know what the issue is? Thanks for the help in advance!! Kind regards, Victor

INFO:maskrcnn_benchmark:eta: 23:23:34 iter: 2000 loss: 0.1211 (0.1859) loss_refine_obj: 0.0000 (0.0000) loss_rel: 0.1211 (0.1859) time: 1.7104 (1.7545) data: 0.0159 (0.0166) lr: 0.120000 max mem: 7458 INFO:maskrcnn_benchmark:Start validating 2022-10-19 11:19:13,597 maskrcnn_benchmark.utils.checkpoint INFO: Saving checkpoint to /cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/exps/50/motif/predcls/sup/sup/model_0002000.pth INFO:maskrcnn_benchmark.utils.checkpoint:Saving checkpoint to /cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/exps/50/motif/predcls/sup/sup/model_0002000.pth INFO:maskrcnn_benchmark:Start evaluation on 50VG_stanford_filtered_with_attribute_val dataset(5000 images). 0%| | 0/2500 [00:00<?, ?it/s] 2022-10-19 11:19:37,757 maskrcnn_benchmark INFO: Start validating INFO:maskrcnn_benchmark:Start validating 2022-10-19 11:19:37,806 maskrcnn_benchmark INFO: Start evaluation on 50VG_stanford_filtered_with_attribute_val dataset(5000 images). INFO:maskrcnn_benchmark:Start evaluation on 50VG_stanford_filtered_with_attribute_val dataset(5000 images). 5%|█████████▊ | 124/2500 [00:31<10:39, 3.72it/s] ==================== ERROR index 4318 (1024, 1024) 1024 768 ==================== 14%|███████████████████████████▋ | 349/2500 [01:29<09:39, 3.71it/s] ==================== ERROR index 4376 (1024, 683) 1024 768 ==================== 17%|█████████████████████████████████ | 418/2500 [01:48<08:09, 4.25it/s] ==================== ERROR index 4389 (612, 612) 1024 768 ==================== 29%|████████████████████████████████████████████████████████▉ | 719/2500 [03:28<07:19, 4.05it/s] ==================== ERROR index 4091 (800, 1150) 1024 768 ==================== 35%|█████████████████████████████████████████████████████████████████████▎ | 875/2500 [04:09<07:08, 3.80it/s] ==================== ERROR index 4081 (1024, 678) 1024 768 ==================== 36%|███████████████████████████████████████████████████████████████████████▏ | 899/2500 [04:15<06:41, 3.99it/s] ==================== ERROR index 4777 (1024, 575) 1024 768 ==================== 38%|██████████████████████████████████████████████████████████████████████████▎ | 939/2500 [04:25<05:53, 4.41it/s] ==================== ERROR index 4644 (952, 1024) 1024 768 ==================== 45%|████████████████████████████████████████████████████████████████████████████████████████▋ | 1125/2500 [05:12<05:23, 4.25it/s]==================== ERROR index 3867 (1000, 586) 1024 768 ==================== 61%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 1515/2500 [06:52<03:49, 4.29it/s]==================== ERROR index 3895 (1024, 1022) 1024 768 ==================== 65%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 1616/2500 [07:18<03:56, 3.73it/s]==================== ERROR index 4234 (1024, 679) 1024 768 ==================== 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 1988/2500 [08:54<02:15, 3.79it/s]==================== ERROR index 4821 (1024, 683) 1024 768 ==================== 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 2225/2500 [09:55<01:16, 3.60it/s]==================== ERROR index 4154 (1024, 683) 1024 768 ==================== 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 2307/2500 [10:15<00:46, 4.13it/s]==================== ERROR index 4751 (1024, 677) 1024 768 ==================== 96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 2391/2500 [10:14<00:28, 3.79it/s] ==================== ERROR index 4934 (683, 1024) 1024 768 ==================== 97%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 2434/2500 [10:49<00:16, 3.88it/s] ==================== ERROR index 4430 (684, 1024) 1024 768 ==================== 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [10:43<00:00, 3.89it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [11:07<00:00, 3.75it/s] INFO:maskrcnn_benchmark:Total run time: 0:11:07.416137 (0.26696645498275756 s / img per device, on 2 devices) INFO:maskrcnn_benchmark:Model inference time: 0:10:06.977097 (0.24279083881378175 s / img per device, on 2 devices) 2022-10-19 11:30:21,114 maskrcnn_benchmark INFO: Total run time: 0:10:43.307338 (0.2573229353904724 s / img per device, on 2 devices) INFO:maskrcnn_benchmark:Total run time: 0:10:43.307338 (0.2573229353904724 s / img per device, on 2 devices) 2022-10-19 11:30:21,115 maskrcnn_benchmark INFO: Model inference time: 0:08:47.517142 (0.21100685682296752 s / img per device, on 2 devices) INFO:maskrcnn_benchmark:Model inference time: 0:08:47.517142 (0.21100685682296752 s / img per device, on 2 devices) Traceback (most recent call last): File "tools/relation_train_net.py", line 383, in <module> main() File "tools/relation_train_net.py", line 376, in main model = train(cfg, args.local_rank, args.distributed, logger) File "tools/relation_train_net.py", line 203, in train val_result = run_val(cfg, model, val_data_loaders, distributed, logger) File "tools/relation_train_net.py", line 251, in run_val dataset_result = inference( File "/cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/maskrcnn_benchmark/engine/inference.py", line 130, in inference image_ids, predictions = _accumulate_predictions_from_multiple_gpus(predictions, synchronize_gather=cfg.TEST.RELATION.SYNC_GATHER) TypeError: cannot unpack non-iterable NoneType object Killing subprocess 779640 Killing subprocess 779641 Traceback (most recent call last): File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module> main() File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/bin/python', '-u', 'tools/relation_train_net.py', '--local_rank=1', '--config-file', 'configs/sup-50.yaml', 'MODEL.ROI_RELATION _HEAD.USE_GT_BOX', 'True', 'MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL', 'True', 'MODEL.ROI_RELATION_HEAD.PREDICTOR', 'MotifPredictor', 'SOLVER.PRE_VAL', 'False', 'SOLVER.IMS_PER_BATCH', '12', 'TEST.IMS_PER_BATCH', '2', 'DTYPE', 'float16 ', 'SOLVER.MAX_ITER', '50000', 'SOLVER.VAL_PERIOD', '2000', 'SOLVER.CHECKPOINT_PERIOD', '2000', 'GLOVE_DIR', '/cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/exps/glove', 'MODEL.PRETRAINED_DETECTOR_CKPT', './maskrcnn_benchmark/pretrain ed/pretrained_faster_rcnn/model_final.pth', 'OUTPUT_DIR', '/cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/exps/50/motif/predcls/sup/sup', 'TEST.METRIC', 'R']' returned non-zero exit status 1.

waxnkw commented 1 year ago

It is weird. The errors seem to be caused by unmatched image size, but I did not modify the image_data.json and jpg files. Please check the matching between image files and image_data.json. You can run:

img_dir = "datasets/vg/VG_100K/"
image_file = "datasets/vg/image_data.json"
with open(image_file, 'r') as f:
    im_data = json.load(f)

corrupted_ims = ['1592.jpg', '1722.jpg', '4616.jpg', '4617.jpg']
for i, img in enumerate(im_data):
    basename = '{}.jpg'.format(img['image_id'])
    if basename in corrupted_ims:
        continue
    filename = os.path.join(img_dir, basename)
    imgrgb = Image.open(filename).convert("RGB")
    if imgrgb.size[0] != img['width'] or imgrgb.size[1] != img['height']:
        print(i)

VSJMilewski commented 1 year ago

Thank you for the quick response. Indeed a couple IDs show up. I tried to redownload the images again and see if that was the issue, but nothing changed. Here is a list of IDs:

5128 5162 5385 5396 5469 5567 5671 5749 5767 5820 6071 6099 6206 6238 6294 6383 6442 6465

waxnkw commented 1 year ago

Did you download the images from the official website? I also upload my previously downloaded image files. You can take a try.

Please try these two links for images.zip and images2.zip.

VSJMilewski commented 1 year ago

Hi. So I explored the issue further, and I think solved it. I tried re-downloading the images from both the official VG page and with your links and redownloading the official VG image_data.json but nothing worked.

Then I found a copy of the image_data.json here: https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/tree/master/datasets/vg This doesn't return any of the ids, so one issue is solved :smiley: Thanks for the help with this one!

Unfortunately, still the error about unpacking the non-iterable NoneType object: INFO:maskrcnn_benchmark:Start evaluation on 50VG_stanford_filtered_with_attribute_val dataset(5000 images). 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [15:15<00:00, 2.73it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [15:15<00:00, 2.73it/s] INFO:maskrcnn_benchmark:Total run time: 0:15:15.360324 (0.36614412946701047 s / img per device, on 2 devices) INFO:maskrcnn_benchmark:Model inference time: 0:13:49.756791 (0.33190271644592284 s / img per device, on 2 devices) 2022-10-26 15:07:25,433 maskrcnn_benchmark INFO: Total run time: 0:15:15.359935 (0.36614397401809695 s / img per device, on 2 devices) Traceback (most recent call last): File "tools/relation_train_net.py", line 383, in <module> INFO:maskrcnn_benchmark:Total run time: 0:15:15.359935 (0.36614397401809695 s / img per device, on 2 devices) 2022-10-26 15:07:25,434 maskrcnn_benchmark INFO: Model inference time: 0:13:47.015618 (0.33080624713897705 s / img per device, on 2 devices) INFO:maskrcnn_benchmark:Model inference time: 0:13:47.015618 (0.33080624713897705 s / img per device, on 2 devices) main() File "tools/relation_train_net.py", line 376, in main model = train(cfg, args.local_rank, args.distributed, logger) File "tools/relation_train_net.py", line 124, in train run_val(cfg, model, val_data_loaders, distributed, logger) File "tools/relation_train_net.py", line 251, in run_val dataset_result = inference( File "/cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/maskrcnn_benchmark/engine/inference.py", line 130, in inference image_ids, predictions = _accumulate_predictions_from_multiple_gpus(predictions, synchronize_gather=cfg.TEST.RELATION.SYNC_GATHER) TypeError: cannot unpack non-iterable NoneType object Killing subprocess 3167680 Killing subprocess 3167681 Traceback (most recent call last): File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module> main() File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/cw/liir_code/NoCsBack/victorm/miniconda3/envs/scene_graph_benchmark/bin/python', '-u', 'tools/relation_train_net.py', '--local_rank=1', '--config-file', 'configs/sup-50.yaml', 'MODEL.ROI_RELATION _HEAD.USE_GT_BOX', 'True', 'MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL', 'True', 'MODEL.ROI_RELATION_HEAD.PREDICTOR', 'MotifPredictor', 'SOLVER.PRE_VAL', 'True', 'SOLVER.IMS_PER_BATCH', '12', 'TEST.IMS_PER_BATCH', '2', 'DTYPE', 'float16' , 'SOLVER.MAX_ITER', '50000', 'SOLVER.VAL_PERIOD', '2000', 'SOLVER.CHECKPOINT_PERIOD', '2000', 'GLOVE_DIR', '/cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/exps/glove', 'MODEL.PRETRAINED_DETECTOR_CKPT', './maskrcnn_benchmark/pretraine d/pretrained_faster_rcnn/model_final.pth', 'OUTPUT_DIR', '/cw/liir_code/NoCsBack/victorm/IETrans-SGG.pytorch/exps/50/motif/predcls/sup/sup', 'TEST.METRIC', 'R']' returned non-zero exit status 1.

waxnkw commented 1 year ago

Sorry for the late reply. I worked on my PhD Qualifying Examination previously. I fix this bug now. You can pull the new code and take a try.

VSJMilewski commented 1 year ago

perfect. That solved the issues! thank you so much for the help. And thanks for the great work!

waxnkw / IETrans-SGG.pytorch

Errors about image size during validation #9