Open nyanmisaka opened 1 year ago
Note that the *_qsv decoders have no such issue but only the d3d11va hwaccel is affected.
Kindly ping someone who might be knowledgeable in d3d11va hwaccel. @galinart @feiwan1 @tong1wu
Thanks in advance!
I will check on this. Thanks.
@nyanmisaka Could you please provide the patch with your change so that I can debug with my own? I have added the D3D11_RESOURCE_MISC_SHARED but it seems not enough. Currently the map only can handle nv12 format. Thanks.
@tong1wu Thanks for look into it! As per the comment in NEO, the P010 format is already supported for years but FFmpeg haven't enabled it yet in the hwcontext_opencl.c
.
Here's the patch: 0001-Enable-P010-format-in-d3d11-opencl-mapping.patch
And here's the other patch that enables QSV/D3D11 to OCL mapping: 0002-Add-support-for-QSV-D3D11-to-OpenCL-mapping.patch
./ffmpeg.exe -init_hw_device d3d11va=dx -init_hw_device qsv=qs@dx -init_hw_device opencl=ocl@dx `
-hwaccel_device qs -filter_hw_device qs `
-hwaccel qsv -hwaccel_output_format qsv `
-c:v av1_qsv -i "av1_clip.mp4" -an -sn `
-vf "hwmap=derive_device=opencl,format=opencl,hwdownload,format=p010" `
-c:v libx264 "ok.mp4"
For comparison this mapping has no issue. So I suspect there's something wrong with the d3d11va hwaccel.
Weird. I tried on TGL. D3d11dec->opencl->d3d11->qsv->download. It worked fine. Rawvideo->d3d11 upload->opencl->download also worked fine. D3d11dec->qsv->opencl->download had corruption.
It doesn't look like something is going wrong with d3d11va because it works for several combinations. It seems the corruption only happens when d3d11dec->opencl then download with opencl.
Most probably something went wrong in the driver when doing the synchronization.
@tong1wu
It doesn't look like something is going wrong with d3d11va because it works for several combinations.
It seems my original issue is specific to Intel discrete GPUs, like the DG1/Xe Max and DG2/Arc. I used to have TGL (i7-1165g7) and it worked fine at that time. Or at least the issue is not obvious but the output is not bitperfect (mismatched checksums).
I also think there is a synchronization issue in FFmpeg or the driver. As far as I know, both ID3D11Device
and OpenCL are thread-safe, while the ID3D11DeviceContext
and ID3D11VideoContext
should not be thread-safe.
So my finding is that when decoding other HEVC 4k clips using d3d11va hwaccel with the following params you can also get the same tearing and artifacts.
-c:v hevc -threads 1 -thread_type -slice-frame
Currently AV1 hwaccel in ffmpeg does not support threading. The above command disables threading in HEVC hwaccel so it triggers the issue too.
From what I understand, if you specify -threads 1, there will be only 1 thread right? And from the code it seems -thread_type -slice-frame doesn't affect anything if threads is already set to 1. Just curious why we have this issue when the thread count is 1.
And I cannot reproduce this hevc issue on TGL. On DG2 it happens randomly, sometimes it's fine. But for the av1 issue I can reproduce it on TGL. I suspect they are different issues.
From what I understand, if you specify -threads 1, there will be only 1 thread right?
Correct. It's my bad. The -threads 1
implies -thread_type -slice-frame
or -thread_type 0
.
Either -threads 1
or -thread_type -frame
can trigger the issue in HEVC hwaccel but -thread_type -slice
still works fine.
On my side both issues occur after the AV_CODEC_CAP_FRAME_THREADS
is disabled (HEVC) or not supported (AV1).
IMHO if the corrupted frame has a pattern similar to the screenshot above, they all should be the synchronization issue.
And I cannot reproduce this hevc issue on TGL.
Indeed. It was hard for me to notice, but there is jittering on certain clips from time to time. Increasing the -threads 1
value to 3 or more helps the issue.
On DG2 it happens randomly, sometimes it's fine.
I never got a normal output using the above command on DG2.
I also found that this issue disappears when you manually limit the speed of the pipeline.
-vf realtime=speed=0.5,...
e.g. for the 60fps AV1 clip it limits the pipeline speed to ~30fps and the issue disappears but it must be inserted before the hwmap=derive_device=opencl
filter and after the decoder d3d11/qsv output. This is the root cause of my suspicion that it is a d3d11va hwaccel issue.
OpenCL driver should have guaranteed the synchronization as you discussed in the other issue channel.
I did a small experiment. Just add following code before clEnqueueAcquireD3D11ObjectsKHR
AVFrame *tmp;
tmp = av_frame_alloc();
tmp->format = AV_PIX_FMT_P010LE;
err = av_hwframe_transfer_data(tmp, src, 0);
av_frame_free(&tmp);
This downloads the data to a useless AVFrame, where D3D11 must provide the synchronization guarantee. And it turns out to be correct for your AV1 command.
I think it's OpenCL's responsibility that it indeed does not do the synchronization job properly and deals with the dirty memory.
That make sense. It invokes ID3D11DeviceContext_CopySubresourceRegion()
so the D3D11 texture from decoder gets synchronized by the D3D driver internally before passing the texture to the clEnqueueAcquireD3D11ObjectsKHR()
.
This can be a temporary workaround but it still degrades performance and its not the desired behavior.
So these should be conclusions:
1) *_qsv
decoders has correct synchronization but d3d11va hwaccel has not.
2) The Windows OpenCL driver does not synchronize D3D11 textures correctly.
Can you intel guys help me forward this issue to the OpenCL team? Seems like I've tried all channels for submitting issues to them with no luck.
Thanks again!
Ok I'll try to forward it to OpenCL. Thanks.
According to OpenCL team, the github issue will be analyzed by the first available engineer. So I guess we need to wait a little bit and keep checking the status of https://github.com/intel/compute-runtime/issues/602.
Thanks for your update. I'll keep an eye on it.
I made some changes to your small experiment to speed up a little bit (GPU->GPU copy). And it's proved that the ID3D11DeviceContext_CopySubresourceRegion()
can sync the texture implicitly.
Do you happen to know is there a similar sync texture function in D3D11 like the vaSyncSurface()
in VA-API?
AVHWFramesContext *src_fc =
(AVHWFramesContext*)src->hw_frames_ctx->data;
AVD3D11VADeviceContext *device_hwctx = src_fc->device_ctx->hwctx;
#if 1
int srcIdx = (intptr_t)src->data[1];
ID3D11Resource *srcTex = (ID3D11Resource *)(ID3D11Texture2D *)src->data[0];
ID3D11Texture2D *tmpTex = NULL;
D3D11_TEXTURE2D_DESC srcTexDesc;
D3D11_TEXTURE2D_DESC tmpTexDesc = {
.Width = src_fc->width,
.Height = src_fc->height,
.MipLevels = 1,
.SampleDesc = { .Count = 1 },
.ArraySize = 1,
.Usage = D3D11_USAGE_DEFAULT, //D3D11_USAGE_STAGING,
//.CPUAccessFlags = D3D11_CPU_ACCESS_READ | D3D11_CPU_ACCESS_WRITE,
};
ID3D11Texture2D_GetDesc((ID3D11Texture2D *)srcTex, &srcTexDesc);
tmpTexDesc.Format = srcTexDesc.Format;
device_hwctx->lock(device_hwctx->lock_ctx);
HRESULT hr = ID3D11Device_CreateTexture2D(device_hwctx->device, &tmpTexDesc, NULL, &tmpTex);
if (FAILED(hr)) {
av_log(src_fc, AV_LOG_ERROR, "Could not create the tmp texture (%lx)\n", (long)hr);
device_hwctx->unlock(device_hwctx->lock_ctx);
return AVERROR_UNKNOWN;
}
ID3D11DeviceContext_CopySubresourceRegion(device_hwctx->device_context,
tmpTex, 0, 0, 0, 0,
srcTex, srcIdx, NULL);
ID3D11Texture2D_Release(tmpTex);
device_hwctx->unlock(device_hwctx->lock_ctx);
#endif
Hello here! I have a use case that needs to use OpenCL kernels to process HW decoded frames to make up for some functions that VPP cannot do.
d3d11va hwaccel -> d3d11 tex -> hwmap -> opencl image -> *_opencl filter
But after testing the mapped image will produce tearing as shown in the video. This issue only happens on Intel GPU, I can't reproduce it on AMD GPU with the same KHR extension. I thought it was an OpenCL runtime issue so I also filed an issue with detailed steps in NEO.
FFmpeg was patched with the D3D11_RESOURCE_MISC_SHARED flag to allow interop. For convenience you can also try our custom ffmpeg builds with DX11/QSV->OCL interop added.
And here's a sample video encoded in AV1 that can trigger this issue. av1_clip.zip
https://user-images.githubusercontent.com/14953024/230185014-8f45d9eb-1349-45c9-bb91-bbdad77d0183.mp4