intel / libvpl

Intel® Video Processing Library (Intel® VPL) API, dispatcher, and examples
https://intel.github.io/libvpl/
MIT License
262 stars 80 forks source link

VPL failing to decode samples from a valid sequence, invalid MFX_ERR_MORE_DATA for I frames #122

Closed chacha21 closed 1 month ago

chacha21 commented 4 months ago

VPL 2.10.1 (not a regression, it did not work with previous version either)

I want to use VPL to decode a H264 sequence embedded in MP4 container (link below to data and sample code). I use Microsoft Media Foundation to query the raw encoded samples from the file I submit the samples to a properly initialized mfxSession For the very first sample (which is a valid I frame), MFX_ERR_MORE_DATA is issued.

By pushing more and more samples, I can finally get some decoded data, but this is not expected behaviour

I want the decoding session to provide the decoded data synchronously when all the required data for an I frame has been submitted.

TestMFTVPL.zip

Is this a VPL design concern ?

chacha21 commented 4 months ago

BTW, where do I download a recent vplswref64.dll ? I can't tell where mine comes from, but it is not built by the current libvpl git project [edit] found here : https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#inpage-nav-8-9

does not solve the bug

shepark commented 4 months ago

This vplswref64.dll was a component leveraging CPU process, and we don't support it anymore. We support only gpu runtime only which it comes with gfx driver. For that, please check the hello-decode sample.

chacha21 commented 4 months ago

This vplswref64.dll was a component leveraging CPU process, and we don't support it anymore. We support only gpu runtime only which it comes with gfx driver. For that, please check the hello-decode sample.

I don't have a development machine with Intel GFX, so I am bound to use "vplswref64.dll".

I don't think that it will be relevant to this thread, I mentioned it because the sample code I provided expects the vpl run-time dlls to be deployed on the host machine, they are not part of the project; so for any one who wants to test, this reference was needed at least to be able to run the code properly.

shepark commented 4 months ago

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

chacha21 commented 4 months ago

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

I'll try to find a host machine with compatible Intel HD graphics. But I'm curious to know if you can test with HW acceleration and see the bug yourself.

shepark commented 4 months ago

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

I'll try to find a host machine with compatible Intel HD graphics. But I'm curious to know if you can test with HW acceleration and see the bug yourself.

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

chacha21 commented 4 months ago

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

I am evaluating vpl as an alternative backend engine for encoding and decoding sequences. I already use Microsoft Media Foundation, CUDA NVEnc, and had a IMSDK implementation in the past.

My use case is : -encoding : either stream, or sequence of images -decoding : either stream, or sequence of images to be randomly accessed frame by frame (not a simple incremental playback).

For decoding, I am pretty familiar with NALUs, I can parse raw samples if needed, and I am already able to determine I, P and B frame in order to submit enough data for each frame, so I really focus on vpl as the last step of decoding.

Currently I am experimenting VPL, but since I do not have compatible hardware, I thought that I could rely on the SW reference implementation to get the best tests, even without full performance.

shepark commented 4 months ago

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

I am evaluating vpl as an alternative backend engine for encoding and decoding sequences. I already use Microsoft Media Foundation, CUDA NVEnc, and had a IMSDK implementation in the past.

My use case is : -encoding : either stream, or sequence of images -decoding : either stream, or sequence of images to be randomly accessed frame by frame (not a simple incremental playback).

For decoding, I am pretty familiar with NALUs, I can parse raw samples if needed, and I am already able to determine I, P and B frame in order to submit enough data for each frame, so I really focus on vpl as the last step of decoding.

Currently I am experimenting VPL, but since I do not have compatible hardware, I thought that I could rely on the SW reference implementation to get the best tests, even without full performance.

Got it. Thank you for the detail information. I will try your code quickly and see whether there's anything missed.

shepark commented 4 months ago

Do you see "MFX_ERR_MORE_DATA" from this part? mfxBitstream bs = {0}; bs.Data = rawFileContent.data(); bs.MaxLength = static_cast(rawFileContent.size()); bs.DataLength = static_cast(rawFileContent.size()); decodeParams.mfx.CodecId = MFX_CODEC_AVC; decodeParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY; printf("try MFXVideoDECODE_DecodeHeader..."); status = MFXVideoDECODE_DecodeHeader(session, &bs, &decodeParams); printf("=>status = %d\r\n", status);

Then, it won't be working because you are feeding mp4 stream, not video elementary stream. You probably know that mp4 is container, and you need to extract raw video data from each packet. VPL does not support any type of container.

chacha21 commented 4 months ago

Do you see "MFX_ERR_MORE_DATA" from this part?

No. I can't send a console log right now (AFK) but the MFX_ERR_MORE_DATA that bothers me is this one :

  if (decStatus == MFX_ERR_MORE_DATA)
    printf("unexpected MFX_ERR_MORE_DATA, this is a mfx wrong behaviour\r\n");

To be comprehensive, please note that

And finally :

shepark commented 4 months ago

Can you share the code you modified? It fails at where I pointed out and can't reach there. Looks like you commented out some parts.

chacha21 commented 4 months ago

Can you share the code you modified?

I did not modify the code attached to the first post of this issue The code shows different strategies to initialize things and some errors are normal. I just put assert() for critical failures.

My console ouput is shown below : Capture

shepark commented 4 months ago

TestMFTVPL.zip

Please check this code. It's dirty but I modified code, to load gpu runtime, to save I frame output and I added some comments. Please refer "hello-decode" or "sample_decode" for general implementation.

chacha21 commented 4 months ago

Ok, I see what you did. I will test on Monday, and if it works I will perform even more tests to check extensively and compare with what the doc claims or misleadingly suggests. Then only I will come back for feedback.

chacha21 commented 4 months ago

Ok, I have tested and there are many problems :

shepark commented 4 months ago

That's why I asked about the goal of your final app, not your experiment. I did show you how you can decode I frame only. If you want to decode full frames, please refer hello-decode or sample_decode.

chacha21 commented 4 months ago

I did show you how you can decode I frame only.

My code can handle non-I frames, Thanks to the Media foundation part, I can read and accumulate samples starting from the previous I frame up to the targeted P frame, and send all the bytes at once to vpl through the bistream. Then I expect VPL to output a frame since it must have enough data (but unfortunately, certainly because of internal buffering, MFX_ERR_MORE_DATA is returned)

The problem is not to read non-I frames, it is to read two different frames (at random positions) : "flushing" seems impossible since the "null bs" trick is not usable.

shepark commented 4 months ago

Have you read hello-decode sample? "null bs" is not a trick, it's needed when you drain remained decoded frames. Once you're done with reading input streams, then you should set bs to null and ask VPL to decode all the streams in the buffer and return. In your case, when it returns MFX_ERR_MORE_DATA, please call MFXVideoDECODE_DecodeFrameAsync() with null bs until it returns MFX_ERR_MORE_DATA again. Let's say you feed, "IPPPP" and want to get the third and last P frames. Then, you call MFXVideoDECODE_DecodeFrameAsync() with input stream (IPPPP). And if VPL returns MFX_ERR_MORE_DATA, then call MFXVideoDECODE_DecodeFrameAsyn() with bs=NULL until it returns MFX_ERR_MORE_DATA again. I expect it give you I, P, P, P. P

chacha21 commented 4 months ago

Have you read hello-decode sample?

Sure, and I learnt nothing new

"null bs" is not a trick, it's needed when you drain remained decoded frames.

Apparently, you can only use it once at the end of the stream. Once it has been done to drain and get a frame (from I, or IP, or IPP, or IPPP...), subsequent calls to MFXVideoDECODE_DecodeFrameAsync() will return MFX_ERR_ABORTED and you cannot send a new bitstream to decode a new frame (I or IP or IPP...) at a totally different position.

shepark commented 4 months ago

That's right. Once you get MFX_ERR_MORE_DATA with bs null, it means no more data left and it will return real error in next call. So, your problem is.. you can't do this continuously but just once .. because decode process will be done once bs=null is given.

chacha21 commented 4 months ago

So, your problem is.. you can't do this continuously but just once .. because decode process will be done once bs=null is given.

Right. With the "bs=null" drain, you fixed the initial problem of this issue thread, that was "can't get a frame at all". But now that I can get a frame, I see that the next step is "I can't get a second frame", in a scenario where I don't read a stream sequentially, but let the user choose a random position in the sequence.

shepark commented 4 months ago

ok.. I don't really have the optimal solution right now but.. Why don't you try giving enough buffer to VPL, which VPL can return Nth frame - avoid MFX_ERR_MORE_DATA? Meanwhile, I will check more.

shepark commented 4 months ago

Please check this as well. https://intel.github.io/libvpl/latest/programming_guide/VPL_prg_decoding.html#bitstream-repositioning

chacha21 commented 4 months ago

https://intel.github.io/libvpl/latest/programming_guide/VPL_prg_decoding.html#bitstream-repositioning

Interesting, so MFXVideoDecode_Reset() should "officially" be the answer for stream repositionning (I suspected it would be inefficient, but that might be wrong). However, I mentioned from the beginning of that thread that in my sample code, MFXVideoDecode_Reset() always returns an error, even with decodeParams manually filled with proper values. I guess I'll have to investigate a little more (perhaps with a debugger to step in vpl source code) to get more clues about that.

shepark commented 4 months ago

@chacha21 Could you close this issue and open new one if you have any issue with MFXVideoDecode_Reset()?

chacha21 commented 4 months ago

@chacha21 Could you close this issue and open new one if you have any issue with MFXVideoDecode_Reset()?

I might or not open a new issue, depending on the following considerations (not clear with the docs) :

akwrobel commented 2 months ago

Closed due to no further issues or blocking feedback from submitter.

chacha21 commented 2 months ago

Aaand, there never was a clear answer from the VPL team. See the last message above : those are pending questions.

akwrobel commented 2 months ago

@chacha21 We did not see a specific concern to address related to the original question.
Can you please clarify what you are looking specifically?

chacha21 commented 2 months ago

I am still looking for a way to perform stream repositioning.

akwrobel commented 2 months ago

Hi @chacha21 We consulted with the VPL GPU Runtime team on this, as the behavior comes from their source code.

They confirmed MFXVideoDECODE_Reset() failing is expected behavior in the following sequence:

And the flush step is not needed if you are jumping to a different location in the bitstream using MFXVideoDECODE_Reset()

Based on your comments in this thread their feedback is that if you are looking to jump to different locations: "this should be done using Reset() and then feeding enough data so that new sequence header is found and decoding may proceed."

This aligns with your scenario 2 above.
If you have more questions regarding the documentation and/or usage senario of MFXVideoDECODE_Reset() a more expedient route would be to file an issue directly with the VPL GPU Runtime team so they can respond directly: https://github.com/intel/vpl-gpu-rt/issues

akwrobel commented 1 month ago

This does not appear to be a VPL issue, and further questions should be directed to the VPL-GPU-RT team, per the comments above. Closing this issue.