pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.49k stars 481 forks source link

[torchbench] `detectron2_fcos_r_50_fpn` fails to run inference. #6833

Closed ysiraichi closed 6 months ago

ysiraichi commented 7 months ago

🐛 Bug

Running the upstreamed benchmarking scripts with the following command results in an unexpected error.

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT \
       --dynamo openxla --dynamo None \
       --test eval \
       --repeat 8 --iterations-per-run 1 \
       --print-subprocess \
       --no-resume -k speech_transformer
Traceback (most recent call last):
  File ""xla/benchmarks/experiment_runner.py"", line 945, in <module>
    main()
  File ""xla/benchmarks/experiment_runner.py"", line 941, in main
    runner.run()
  File ""xla/benchmarks/experiment_runner.py"", line 61, in run
    self.run_single_config()
  File ""xla/benchmarks/experiment_runner.py"", line 256, in run_single_config
    metrics, last_output = self.run_once_and_gather_metrics(
  File ""xla/benchmarks/experiment_runner.py"", line 345, in run_once_and_gather_metrics
    output, _ = loop(iter_fn=self._default_iter_fn)
  File ""xla/benchmarks/experiment_runner.py"", line 302, in loop
    output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
  File ""xla/benchmarks/experiment_runner.py"", line 218, in _default_iter_fn
    output = benchmark_model.model_iter_fn(
  File ""/home/ysiraichi/pytorch/torch/_dynamo/eval_frame.py"", line 390, in _fn
    return fn(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/xla/benchmarks/benchmark_model.py"", line 170, in eval
    pred = self.module(*inputs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File ""/home/ysiraichi/.local/lib/python3.8/site-packages/detectron2/modeling/meta_arch/dense_detector.py"", line 95, in forward
    images = self.preprocess_image(batched_inputs)
  File ""/home/ysiraichi/.local/lib/python3.8/site-packages/detectron2/modeling/meta_arch/dense_detector.py"", line 96, in torch_dynamo_resume_in_forward_at_95
    features = self.backbone(images.tensor)
  File ""/home/ysiraichi/.local/lib/python3.8/site-packages/detectron2/modeling/meta_arch/dense_detector.py"", line 98, in torch_dynamo_resume_in_forward_at_96
    predictions = self.head(features)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File ""/home/ysiraichi/.local/lib/python3.8/site-packages/detectron2/modeling/meta_arch/fcos.py"", line 324, in forward
    logits.append(self.cls_score(self.cls_subnet(feature)))
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/container.py"", line 217, in forward
    input = module(input)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/module.py"", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/conv.py"", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File ""/home/ysiraichi/pytorch/torch/nn/modules/conv.py"", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

Environment

cc @miladm @JackCaoG @vanbasten23 @zpcore @frgossen @golechwierowicz @cota

zpcore commented 7 months ago

Same cause of https://github.com/pytorch/xla/issues/6831, close for now.

ysiraichi commented 7 months ago

This is still failing for me.