VPL failing to decode samples from a valid sequence, invalid MFX_ERR_MORE_DATA for I frames

chacha21 commented 4 months ago

VPL 2.10.1 (not a regression, it did not work with previous version either)

I want to use VPL to decode a H264 sequence embedded in MP4 container (link below to data and sample code). I use Microsoft Media Foundation to query the raw encoded samples from the file I submit the samples to a properly initialized mfxSession For the very first sample (which is a valid I frame), MFX_ERR_MORE_DATA is issued.

By pushing more and more samples, I can finally get some decoded data, but this is not expected behaviour

I want the decoding session to provide the decoded data synchronously when all the required data for an I frame has been submitted.

TestMFTVPL.zip

Is this a VPL design concern ?

chacha21 commented 4 months ago

BTW, where do I download a recent vplswref64.dll ? I can't tell where mine comes from, but it is not built by the current libvpl git project [edit] found here : https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#inpage-nav-8-9

does not solve the bug

shepark commented 4 months ago

This vplswref64.dll was a component leveraging CPU process, and we don't support it anymore. We support only gpu runtime only which it comes with gfx driver. For that, please check the hello-decode sample.

chacha21 commented 4 months ago

This vplswref64.dll was a component leveraging CPU process, and we don't support it anymore. We support only gpu runtime only which it comes with gfx driver. For that, please check the hello-decode sample.

I don't have a development machine with Intel GFX, so I am bound to use "vplswref64.dll".

I don't think that it will be relevant to this thread, I mentioned it because the sample code I provided expects the vpl run-time dlls to be deployed on the host machine, they are not part of the project; so for any one who wants to test, this reference was needed at least to be able to run the code properly.

shepark commented 4 months ago

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

chacha21 commented 4 months ago

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

I'll try to find a host machine with compatible Intel HD graphics. But I'm curious to know if you can test with HW acceleration and see the bug yourself.

shepark commented 4 months ago

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

I'll try to find a host machine with compatible Intel HD graphics. But I'm curious to know if you can test with HW acceleration and see the bug yourself.

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

chacha21 commented 4 months ago

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

I am evaluating vpl as an alternative backend engine for encoding and decoding sequences. I already use Microsoft Media Foundation, CUDA NVEnc, and had a IMSDK implementation in the past.

My use case is : -encoding : either stream, or sequence of images -decoding : either stream, or sequence of images to be randomly accessed frame by frame (not a simple incremental playback).

For decoding, I am pretty familiar with NALUs, I can parse raw samples if needed, and I am already able to determine I, P and B frame in order to submit enough data for each frame, so I really focus on vpl as the last step of decoding.

Currently I am experimenting VPL, but since I do not have compatible hardware, I thought that I could rely on the SW reference implementation to get the best tests, even without full performance.

shepark commented 4 months ago

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

I am evaluating vpl as an alternative backend engine for encoding and decoding sequences. I already use Microsoft Media Foundation, CUDA NVEnc, and had a IMSDK implementation in the past.

My use case is : -encoding : either stream, or sequence of images -decoding : either stream, or sequence of images to be randomly accessed frame by frame (not a simple incremental playback).

For decoding, I am pretty familiar with NALUs, I can parse raw samples if needed, and I am already able to determine I, P and B frame in order to submit enough data for each frame, so I really focus on vpl as the last step of decoding.

Currently I am experimenting VPL, but since I do not have compatible hardware, I thought that I could rely on the SW reference implementation to get the best tests, even without full performance.

Got it. Thank you for the detail information. I will try your code quickly and see whether there's anything missed.

shepark commented 4 months ago

Do you see "MFX_ERR_MORE_DATA" from this part? mfxBitstream bs = {0}; bs.Data = rawFileContent.data(); bs.MaxLength = static_cast(rawFileContent.size()); bs.DataLength = static_cast(rawFileContent.size()); decodeParams.mfx.CodecId = MFX_CODEC_AVC; decodeParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY; printf("try MFXVideoDECODE_DecodeHeader..."); status = MFXVideoDECODE_DecodeHeader(session, &bs, &decodeParams); printf("=>status = %d\r\n", status);

Then, it won't be working because you are feeding mp4 stream, not video elementary stream. You probably know that mp4 is container, and you need to extract raw video data from each packet. VPL does not support any type of container.

chacha21 commented 4 months ago

Do you see "MFX_ERR_MORE_DATA" from this part?

No. I can't send a console log right now (AFK) but the MFX_ERR_MORE_DATA that bothers me is this one :

  if (decStatus == MFX_ERR_MORE_DATA)
    printf("unexpected MFX_ERR_MORE_DATA, this is a mfx wrong behaviour\r\n");

To be comprehensive, please note that

only the "try by manually filling params known in advance" works to fill the decodeParams
MFXVideoDECODE_Reset() issues an error that I cannot explain but it does not seem to be a stopper

And finally :

at first sight we could say that the mfx decoder is badly configured or cannot handle that codec
but it is a wrong assumption : if you keep pushing samples, you will get correctly decoded images; you just don't get them when enough data is provided, but a little after, like if a Flush() was pending and not done at the right moment

shepark commented 4 months ago

Can you share the code you modified? It fails at where I pointed out and can't reach there. Looks like you commented out some parts.

chacha21 commented 4 months ago

Can you share the code you modified?

I did not modify the code attached to the first post of this issue The code shows different strategies to initialize things and some errors are normal. I just put assert() for critical failures.

My console ouput is shown below : Capture

shepark commented 4 months ago

TestMFTVPL.zip

Please check this code. It's dirty but I modified code, to load gpu runtime, to save I frame output and I added some comments. Please refer "hello-decode" or "sample_decode" for general implementation.

chacha21 commented 4 months ago

Ok, I see what you did. I will test on Monday, and if it works I will perform even more tests to check extensively and compare with what the doc claims or misleadingly suggests. Then only I will come back for feedback.

chacha21 commented 4 months ago

Ok, I have tested and there are many problems :

indeed, I can get 1 frame thanks to the "pseudo sync" trick where you use a null bs
for some reason, I have a memory fault when trying to read the UV data. This is very suspicious. So I only read Y for the moment to get gray scale (Yes I know that UV data is h/2 and either w or w/2 according to a flat or 2-channels interpretation)
after reading 1 frame, the next calls to DecodeFrameAsync() will return MFX_ERR_ABORTED. According to the VPL doc, sending a null bitstream is supposed to be done at the end of stream, not end of frame. I think that's why
thus I cannot get more frames
calling MFXVideoDecode_Reset() between frames would be inefficient but could help. Actually it just does not work (it always returns an unexpected and unexplained error)

shepark commented 4 months ago

That's why I asked about the goal of your final app, not your experiment. I did show you how you can decode I frame only. If you want to decode full frames, please refer hello-decode or sample_decode.

chacha21 commented 4 months ago

I did show you how you can decode I frame only.

My code can handle non-I frames, Thanks to the Media foundation part, I can read and accumulate samples starting from the previous I frame up to the targeted P frame, and send all the bytes at once to vpl through the bistream. Then I expect VPL to output a frame since it must have enough data (but unfortunately, certainly because of internal buffering, MFX_ERR_MORE_DATA is returned)

The problem is not to read non-I frames, it is to read two different frames (at random positions) : "flushing" seems impossible since the "null bs" trick is not usable.

shepark commented 4 months ago

Have you read hello-decode sample? "null bs" is not a trick, it's needed when you drain remained decoded frames. Once you're done with reading input streams, then you should set bs to null and ask VPL to decode all the streams in the buffer and return. In your case, when it returns MFX_ERR_MORE_DATA, please call MFXVideoDECODE_DecodeFrameAsync() with null bs until it returns MFX_ERR_MORE_DATA again. Let's say you feed, "IPPPP" and want to get the third and last P frames. Then, you call MFXVideoDECODE_DecodeFrameAsync() with input stream (IPPPP). And if VPL returns MFX_ERR_MORE_DATA, then call MFXVideoDECODE_DecodeFrameAsyn() with bs=NULL until it returns MFX_ERR_MORE_DATA again. I expect it give you I, P, P, P. P

chacha21 commented 4 months ago

Have you read hello-decode sample?

Sure, and I learnt nothing new

"null bs" is not a trick, it's needed when you drain remained decoded frames.

Apparently, you can only use it once at the end of the stream. Once it has been done to drain and get a frame (from I, or IP, or IPP, or IPPP...), subsequent calls to MFXVideoDECODE_DecodeFrameAsync() will return MFX_ERR_ABORTED and you cannot send a new bitstream to decode a new frame (I or IP or IPP...) at a totally different position.

shepark commented 4 months ago

That's right. Once you get MFX_ERR_MORE_DATA with bs null, it means no more data left and it will return real error in next call. So, your problem is.. you can't do this continuously but just once .. because decode process will be done once bs=null is given.

chacha21 commented 4 months ago

So, your problem is.. you can't do this continuously but just once .. because decode process will be done once bs=null is given.

Right. With the "bs=null" drain, you fixed the initial problem of this issue thread, that was "can't get a frame at all". But now that I can get a frame, I see that the next step is "I can't get a second frame", in a scenario where I don't read a stream sequentially, but let the user choose a random position in the sequence.

shepark commented 4 months ago

ok.. I don't really have the optimal solution right now but.. Why don't you try giving enough buffer to VPL, which VPL can return Nth frame - avoid MFX_ERR_MORE_DATA? Meanwhile, I will check more.

shepark commented 4 months ago

Please check this as well. https://intel.github.io/libvpl/latest/programming_guide/VPL_prg_decoding.html#bitstream-repositioning

chacha21 commented 4 months ago

https://intel.github.io/libvpl/latest/programming_guide/VPL_prg_decoding.html#bitstream-repositioning

Interesting, so MFXVideoDecode_Reset() should "officially" be the answer for stream repositionning (I suspected it would be inefficient, but that might be wrong). However, I mentioned from the beginning of that thread that in my sample code, MFXVideoDecode_Reset() always returns an error, even with decodeParams manually filled with proper values. I guess I'll have to investigate a little more (perhaps with a debugger to step in vpl source code) to get more clues about that.

shepark commented 4 months ago

@chacha21 Could you close this issue and open new one if you have any issue with MFXVideoDecode_Reset()?

chacha21 commented 4 months ago

@chacha21 Could you close this issue and open new one if you have any issue with MFXVideoDecode_Reset()?

I might or not open a new issue, depending on the following considerations (not clear with the docs) :

scenario 1 : bs=null (in order to flush) then MFXVideoDecode_Reset() is the correct way to decode frames at random positions. In that case, I do have a problem with MFXVideoDecode_Reset() and can open a new issue
scenario 2: bs=null (in order to flush) then MFXVideoDecode_Reset() is expected to fail since the bs is considered aborted and thus MFXVideoDecode_Reset() can't help. In that case I cannot claim I observe a bug (even if I can't make it work, but it wouldn't be a MFXVideoDecode_Reset() problem)
scenario 3: MFXVideoDecode_Reset() can be used to flush before repositioning (very unlikely : the doc does not really tells that). In that case we are still bringing information to the current issue.

akwrobel commented 2 months ago

Closed due to no further issues or blocking feedback from submitter.

chacha21 commented 2 months ago

Aaand, there never was a clear answer from the VPL team. See the last message above : those are pending questions.

akwrobel commented 2 months ago

@chacha21 We did not see a specific concern to address related to the original question.
Can you please clarify what you are looking specifically?

chacha21 commented 2 months ago

I am still looking for a way to perform stream repositioning.

akwrobel commented 2 months ago

Hi @chacha21 We consulted with the VPL GPU Runtime team on this, as the behavior comes from their source code.

They confirmed MFXVideoDECODE_Reset() failing is expected behavior in the following sequence:

MFXVideoDECODE_Init()
Send bitstream until we get first frame
Flush Stream to remove any pending unprocessed data
Call MFXVideoDECODE_Reset() as we want to continue decode from a different location in the bitstream.

And the flush step is not needed if you are jumping to a different location in the bitstream using MFXVideoDECODE_Reset()

Based on your comments in this thread their feedback is that if you are looking to jump to different locations: "this should be done using Reset() and then feeding enough data so that new sequence header is found and decoding may proceed."

This aligns with your scenario 2 above.
If you have more questions regarding the documentation and/or usage senario of MFXVideoDECODE_Reset() a more expedient route would be to file an issue directly with the VPL GPU Runtime team so they can respond directly: https://github.com/intel/vpl-gpu-rt/issues

akwrobel commented 1 month ago

This does not appear to be a VPL issue, and further questions should be directed to the VPL-GPU-RT team, per the comments above. Closing this issue.

intel / libvpl

VPL failing to decode samples from a valid sequence, invalid MFX_ERR_MORE_DATA for I frames #122