Open fegin opened 2 months ago
We could do this as a stop-gap fix, since sizevars should always be integers:
def codegen_python_sizevar(self, x: Expr) -> str:
x_s = V.graph.sizevars.simplify(x)
if not x_s.free_symbols:
x_s = sympy.Integer(x_s)
return pexpr(x_s)
But a proper fix is to find out where the floats are generated.
🐛 Describe the bug
When running the Dynamo hf_Whisper benchmark with CompiledDDP (CompiledAutograd + DDP Python reducer), the Inductor generates the code that cannot be run. The error is
inductor::_reinterpret_tensor() Expected a value of type 'List[int]' for argument 'size' but instead found type 'tuple'
-- the generated code contains float which does not accept byinductor::_reinterpret_tensor()
. The gradients of this model contain dynamic shape which may be related to the error. We are unable to reproduce this error with an unittest or a smaller model.To reproduce this error, checkout https://github.com/pytorch/pytorch/pull/121315 and use the following comment:
python benchmarks/dynamo/torchbench.py --performance --cold-start-latency --training --backend inductor --disable-cudagraphs --device cuda --ddp --multiprocess --optimize-ddp-mode="python_reducer" --only hf_Whisper --compiled-autograd
Error logs
Minified repro
No response
Versions
https://github.com/pytorch/pytorch/pull/121315
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang