zjhuang22 / maskscoring_rcnn

Codes for paper "Mask Scoring R-CNN".
MIT License
1.9k stars 378 forks source link

AttributeError: 'list' object has no attribute 'resize' #45

Open HAILUWANG opened 5 years ago

HAILUWANG commented 5 years ago

When I use simple GPU to train the network.I have a problem"AttributeError: 'list' object has no attribute 'resize'".Could you please tell me how to solve this problem.Thank you very much.

PyTorch version: 1.1.0.dev20190506 Is debug build: No CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.3 LTS GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 CMake version: version 3.5.1

Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 7.5.17 GPU models and configuration: GPU 0: GeForce GTX TITAN X Nvidia driver version: 418.40.04 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.3 /usr/local/lib/libcudnn.so.5.1.10

Versions of relevant libraries: [pip] numpy==1.16.3 [pip] torch==1.1.0.dev20190506 [pip] torchvision==0.2.3a0+d534785 [conda] blas 1.0 mkl
[conda] mkl 2019.3 199
[conda] mkl_fft 1.0.12 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch-nightly 1.1.0.dev20190506 py3.7_cuda9.0.176_cudnn7.5.1_0 pytorch Pillow (6.0.0) 2019-05-08 22:23:42,841 maskrcnn_benchmark INFO: Loaded configuration file configs/e2e_ms_rcnn_R_50_FPN_1x.yaml 2019-05-08 22:23:42,842 maskrcnn_benchmark INFO: . . . . 2019-05-08 22:23:58,578 maskrcnn_benchmark.trainer INFO: Start training Traceback (most recent call last): File "tools/train_net.py", line 172, in main() File "tools/train_net.py", line 165, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 74, in train arguments, File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/engine/trainer.py", line 56, in dotrain for iteration, (images, targets, ) in enumerate(data_loader, start_iter): File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) AttributeError: Traceback (most recent call last): File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 85, in getitem return self.datasets[dataset_idx][sample_idx] File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/datasets/coco.py", line 36, in getitem img, anno = super(COCODataset, self).getitem(idx) File "/home/whl/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torchvision-0.2.3a0+d534785-py3.7.egg/torchvision/datasets/coco.py", line 114, in getitem img, target = self.transforms(img, target) File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 14, in call image, target = t(image, target) File "/home/whl/github/maskscoring_rcnn/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call target = target.resize(image.size) AttributeError: 'list' object has no attribute 'resize'

zjhuang22 commented 5 years ago

Could you show me your running script? It may be the problem of batch size

HAILUWANG commented 5 years ago

I don't do any modify of the train file.It is follow

Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.

r""" Basic training script for PyTorch """

Set up custom environment before nearly anything else is importd

NOTE: this should be the first import (no not reorder)

from maskrcnn_benchmark.utils.env import setup_environment # noqa F401 isort:skip

import argparse import os

import torch from maskrcnn_benchmark.config import cfg from maskrcnn_benchmark.data import make_data_loader from maskrcnn_benchmark.solver import make_lr_scheduler from maskrcnn_benchmark.solver import make_optimizer from maskrcnn_benchmark.engine.inference import inference from maskrcnn_benchmark.engine.trainer import do_train from maskrcnn_benchmark.modeling.detector import build_detection_model from maskrcnn_benchmark.utils.checkpoint import DetectronCheckpointer from maskrcnn_benchmark.utils.collect_env import collect_env_info from maskrcnn_benchmark.utils.comm import synchronize, get_rank from maskrcnn_benchmark.utils.imports import import_file from maskrcnn_benchmark.utils.logger import setup_logger from maskrcnn_benchmark.utils.miscellaneous import mkdir

def train(cfg, local_rank, distributed): model = build_detection_model(cfg) device = torch.device(cfg.MODEL.DEVICE) model.to(device)

optimizer = make_optimizer(cfg, model)
scheduler = make_lr_scheduler(cfg, optimizer)

if distributed:
    model = torch.nn.parallel.deprecated.DistributedDataParallel(
        model, device_ids=[local_rank], output_device=local_rank,
        # this should be removed if we update BatchNorm stats
        broadcast_buffers=False,
    )

arguments = {}
arguments["iteration"] = 0

output_dir = cfg.OUTPUT_DIR

save_to_disk = get_rank() == 0
checkpointer = DetectronCheckpointer(
    cfg, model, optimizer, scheduler, output_dir, save_to_disk
)
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
arguments.update(extra_checkpoint_data)

data_loader = make_data_loader(
    cfg,
    is_train=True,
    is_distributed=distributed,
    start_iter=arguments["iteration"],
)

checkpoint_period = cfg.SOLVER.CHECKPOINT_PERIOD

do_train(
    model,
    data_loader,
    optimizer,
    scheduler,
    checkpointer,
    device,
    checkpoint_period,
    arguments,
)

return model

def test(cfg, model, distributed): if distributed: model = model.module torch.cuda.empty_cache() # TODO check if it helps iou_types = ("bbox",) if cfg.MODEL.MASK_ON: iou_types = iou_types + ("segm",) output_folders = [None] * len(cfg.DATASETS.TEST) if cfg.OUTPUT_DIR: dataset_names = cfg.DATASETS.TEST for idx, dataset_name in enumerate(dataset_names): output_folder = os.path.join(cfg.OUTPUT_DIR, "inference", dataset_name) mkdir(output_folder) output_folders[idx] = output_folder data_loaders_val = make_data_loader(cfg, is_train=False, is_distributed=distributed) for output_folder, data_loader_val in zip(output_folders, data_loaders_val): inference( model, data_loader_val, iou_types=iou_types, box_only=cfg.MODEL.RPN_ONLY, device=cfg.MODEL.DEVICE, expected_results=cfg.TEST.EXPECTED_RESULTS, expected_results_sigma_tol=cfg.TEST.EXPECTED_RESULTS_SIGMA_TOL, output_folder=output_folder, maskiou_on=cfg.MODEL.MASKIOU_ON ) synchronize()

def main(): parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") parser.add_argument( "--config-file", default="", metavar="FILE", help="path to config file", type=str, ) parser.add_argument("--local_rank", type=int, default=0) parser.add_argument( "--skip-test", dest="skip_test", help="Do not test the final model", action="store_true", ) parser.add_argument( "opts", help="Modify config options using the command-line", default=None, nargs=argparse.REMAINDER, )

args = parser.parse_args()

num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
args.distributed = num_gpus > 1

if args.distributed:
    torch.cuda.set_device(args.local_rank)
    torch.distributed.deprecated.init_process_group(
        backend="nccl", init_method="env://"
    )

cfg.merge_from_file(args.config_file)
cfg.merge_from_list(args.opts)
cfg.freeze()

output_dir = cfg.OUTPUT_DIR
if output_dir:
    mkdir(output_dir)

logger = setup_logger("maskrcnn_benchmark", output_dir, get_rank())
logger.info("Using {} GPUs".format(num_gpus))
logger.info(args)

logger.info("Collecting env info (might take some time)")
logger.info("\n" + collect_env_info())

logger.info("Loaded configuration file {}".format(args.config_file))
with open(args.config_file, "r") as cf:
    config_str = "\n" + cf.read()
    logger.info(config_str)
logger.info("Running with config:\n{}".format(cfg))

model = train(cfg, args.local_rank, args.distributed)

if not args.skip_test:
    test(cfg, model, args.distributed)

if name == "main": main()

Thank you

zjhuang22 commented 5 years ago

And your running command?

HAILUWANG commented 5 years ago

I follow the introduction to use python tools/train_net.py --config-file "configs/e2e_ms_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

tjzjp commented 5 years ago

I met the same problem, have you solved it?

HAILUWANG commented 5 years ago

No, I have the problem for some days, and I am trying to solve it but it cannot run now

13012476909 commented 5 years ago

I also have the same problem

zjhuang22 commented 5 years ago

Hi, I think it might the problem of torchvision. Most of my settings are the same as yours except torchvision. My torchvision is 0.2.1 and I can not find a version you use (0.2.3a0+d534785). Maybe you can try 0.2.1.

13012476909 commented 5 years ago

You are right.

------------------ 原始邮件 ------------------ 发件人: "zjhuang22"notifications@github.com; 发送时间: 2019年5月14日(星期二) 晚上9:01 收件人: "zjhuang22/maskscoring_rcnn"maskscoring_rcnn@noreply.github.com; 抄送: "550464679"550464679@qq.com;"Comment"comment@noreply.github.com; 主题: Re: [zjhuang22/maskscoring_rcnn] AttributeError: 'list' object has noattribute 'resize' (#45)

Hi, I think it might the problem of torchvision. Most of my settings are the same as yours except torchvision. My torchvision is 0.2.1 and I can not find a version you use (0.2.3a0+d534785). Maybe you can try 0.2.1.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

yukang2017 commented 5 years ago

金哥牛逼!这个问题我在maskrcnn benchmark上遇到,他们的issues下面都没找到解决办法,在你这儿找到了,你真棒!

HAILUWANG commented 5 years ago

Thank you very much

jhtao1860 commented 5 years ago

@zjhuang22 you are right