Closed ESI-SYD closed 7 months ago
See child issues
The pinned commit of torch vision (47cd5ea8e21d7596a24907710411d6b4a43f628d https://github.com/Stonepia/pytorch/blob/dev/triton-test-3.0/.github/ci_commit_pins/vision.txt) cannot be build successfully with the latest ffmpeg, due to removal of several deprecated features, including the flags AV_CODEC_CAP_TRUNCATED, AV_CODEC_CAP_AUTO_THREADS, AV_CODEC_CAP_INTRA_ONLY, AV_CODEC_CAP_LOSSLESS, and AVFMT_FLAG_PRIV_OPT.
/home/jovyan/vision/torchvision/csrc/io/decoder/stream.cpp: In member function ‘int ffmpeg::Stream::openCodec(std::vector<ffmpeg::DecoderMetadata>*, int)’:
/home/jovyan/vision/torchvision/csrc/io/decoder/stream.cpp:68:42: error: ‘AV_CODEC_CAP_INTRA_ONLY’ was not declared in this scope; did you mean ‘AV_CODEC_PROP_INTRA_ONLY’?
68 | if (codecCtx_->codec->capabilities & AV_CODEC_CAP_INTRA_ONLY) {
| ^~~~~~~~~~~~~~~~~~~~~~~
| AV_CODEC_PROP_INTRA_ONLY
conda install -c conda-forge 'ffmpeg<4.4'
can be used to downgrade ffmpeg
.
dlrm passes with the setup below:
Stonepia/pytorch dev/triton-test-3.0 0f6d72ce16bd4b30402dcad97144d17cd7bc53ed
weishi-deng/benchmark 9371b9e13c826f3930e54346b4d619cb59182f68
intel/intel-xpu-backend-for-triton b6d3678483dbffa58f0470a46c0b512f223aabda
intel-extension-for-pytorch 2.1.10+git99b4297
torch 2.1.0a0+git0f6d72c
torchaudio 2.0.0a0+a8f4e97
torchtext 0.16.0a0+b0ebddc
torchvision 0.18.0a0+a52607e
To resolve infra_error: ImportError: libGL.so.1: cannot open shared object file: No such file or directory
,
sudo apt install libgl1-mesa-glx
To resolve the error: TypeError: can't convert xpu:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
, can modify torchbenchmark/models/LearningToPaint/baseline/utils/util.py
like:
-USE_CUDA = torch.cuda.is_available()
+USE_CUDA = torch.cuda.is_available() or torch.xpu.is_available()
With the above changes and setup described in https://github.com/intel/intel-xpu-backend-for-triton/issues/438#issuecomment-1937102129, Background_Matting
and LearningToPaint
can both pass.
nvidia_deeprecommender
, pytorch_CycleGAN_and_pix2pix
, torch_multimodal_clip
and yolov3
pass with https://github.com/weishi-deng/benchmark/commit/02e383463fa954c49db2e8983e2c6441afc2ca5a.
@whitneywhtsang so far looks like you found benchmarks or environment problems only (for this benchmark). Correct ?
@whitneywhtsang so far looks like you found benchmarks or environment problems only (for this benchmark). Correct ?
Correct, and there are no regressions found compare to my v2.1 run.
@vlad-penkin There are no regressions, can we close this issue?
Accuracy check results of
torchbench
models based on triton 3.0.0 (6 test scenarios in total)Test mode:
inference
andtraining
Test datatype:amp_bf16
amp_fp16
float32
This issue can be split into multiple work items
Failed model list
Reproduce: (replace with real dtype and model)
Version:
triton: https://github.com/intel/intel-xpu-backend-for-triton/commit/97ac4f91d149a3392d6e14f5d39aa4953fb6c56e