intel / torch-xpu-ops

Apache License 2.0
14 stars 7 forks source link

Different behavior in adaptive average pooling as CPU and CUDA when output_size == 1 #523

Open fengyuan14 opened 5 days ago

fengyuan14 commented 5 days ago

🚀 The feature, motivation and pitch

https://github.com/pytorch/pytorch/blob/76259ebfdd83389eeb5735e76f66fd2ad84a9671/aten/src/ATen/native/AdaptiveAveragePooling.cpp#L120 When output_size == 1, CPU and CUDA are using reduce mean, but we are using adaptive_avg_pool. The story is we preferred oneDNN implementation before. The issue is recorded to evaluate whether the difference should be kept.

The current implementation is to register wrapper variant aten::adaptive_avg_pool2d to retrieve the logic for XPU. https://github.com/intel/torch-xpu-ops/pull/445/files#diff-c9c95053eb36049540db8fe17969d81b66fb3ec6ef753deaf0219eb7b13c9998R39-R67

Alternatives

No response

Additional context

No response

fengyuan14 commented 5 days ago

Without the logic (mean for output_size == 1), a model in TorchBench crashes due to lack of deterministic impl in adaptive avg pool2d.