strukturag / libheif

libheif is an HEIF and AVIF file format decoder and encoder.
Other
1.76k stars 302 forks source link

Use hevc_cuvid instead of software decoding if available #1322

Open Neoclassic opened 1 month ago

Neoclassic commented 1 month ago

Feature request to use hardware decoding if available for hevc formats.

Please apprise if it already possible. I already have a ffmpeg dist with nvidia hevc decoder.

Although that works fine for some heic to png. But In case of images with tiles it fails to provide full image area as png.

bradh commented 1 month ago

You can try with the branch if you like: https://github.com/strukturag/libheif/pull/1296

However it should be possible to build the main branch against ffmpeg dist and use that. Its not enabled by default but you can compile it to be so.

The bigger issue is "what is happening when it fails" for your samples. Do you have a specific example that consistently shows the problem? Can you share that sample?

Neoclassic commented 1 month ago

@bradh

Case 1: Fails completely https://github.com/Neoclassic/heic_test_cases/blob/main/exif.heic

testfiles % ffprobe -hide_banner -i $loc                           
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test_folder_conversion_png_1/test/exif.heic':
  Metadata:
    major_brand     : mif1
    minor_version   : 0
    compatible_brands: mif1
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0[0x4e22]: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1280x720, 1 fps, 1 tbr, 1 tbn (default)
      Metadata:
        title           : HEVC Image
 testfiles % 
testfiles % ffmpeg -hide_banner -loglevel error -i $loc -map 0 %d.png
[hevc @ 0x13d805500] Invalid NAL unit size (0 > 176).
[hevc @ 0x13d805500] Error splitting the input into NAL units.
[vist#0:0/hevc @ 0x13c605f70] [dec:hevc @ 0x13d804900] Decoding error: Invalid data found when processing input
[vist#0:0/hevc @ 0x13c605f70] [dec:hevc @ 0x13d804900] Decode error rate 1 exceeds maximum 0.666667
[vist#0:0/hevc @ 0x13c605f70] [dec:hevc @ 0x13d804900] Task finished with error code: -1145393733 (Error number -1145393733 occurred)
[vist#0:0/hevc @ 0x13c605f70] [dec:hevc @ 0x13d804900] Terminating thread with return code -1145393733 (Error number -1145393733 occurred)

Case 2: https://github.com/Neoclassic/heic_test_cases/blob/main/LiveOff.HEIC

Type is : hevc (Main Still Picture)

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'LiveOff.HEIC':
  Metadata:
    major_brand     : heic
    minor_version   : 0
    compatible_brands: mif1heic
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream group #0:0[0x31]: Tile Grid: hevc (Main Still Picture) (hvc1 / 0x31637668), yuvj420p(pc), 4032x3024 (default)
  Stream #0:48[0x32]: Video: hevc (Main Still Picture) (hvc1 / 0x31637668), yuvj420p(pc), 320x240, 1 fps, 1 tbr, 1 tbn
      Side data:
        ICC Profile

So it has an embedded thumbnail as well Stream #0:48[0x32] of size 320x240, but requirement is convert full image to png.

One option is to get tiles and then montage them.

bradh commented 1 month ago

Have you tested with libheif and the ffmpeg decoder plugin?

Neoclassic commented 1 month ago

@bradh Yes, surely going to try. There is some learning curve for me here.

bradh commented 1 month ago

I decoded them OK with libheif (not HW accelerated though). Also, the iphone image has personal information in it, which you might want to strip out.

Neoclassic commented 1 month ago

Thanks bradh, I cannot work with software decoding due to legal challenges related with the format. But will try with your PR.

Neoclassic commented 1 month ago

@bradh

I built libheif using https://github.com/bradh/libheif/tree/nvdev_merge_2

But getting this error

/usr/local/bin/heif-dec /heifcases/LiveOff.HEIC out.png

[istream] request_range 0 - 1024 [istream] request_range 24 - 3946 [istream] request_range 15119 - 17157 File contains 1 image [istream] request_range 24447 - 46783 [istream] request_range 17157 - 24447 [istream] request_range 46783 - 78025 [istream] request_range 78025 - 103598 [hevc_cuvid @ 0x7220a0024640] Invalid pkt_timebase, passing timestamps as-is. [hevc_cuvid @ 0x7220a801d0c0] Invalid pkt_timebase, passing timestamps as-is. [hevc_cuvid @ 0x722094025f80] Invalid pkt_timebase, passing timestamps as-is. [hevc_cuvid @ 0x72209c028bc0] Invalid pkt_timebase, passing timestamps as-is. Could not decode image: 0: Decoder plugin generated an error: Unspecified: avcodec_receive_frame returned EAGAIN or ERROR_EOF

Neoclassic commented 1 month ago

Same problem with main branch as well

cmake .. -DCMAKE_INSTALL_RPATH=/libheifbin -DWITH_DAV1D=OFF -DWITH_GDK_PIXBUF=OFF -DWITH_RAV1E=OFF -DWITH_SvtEnc=OFF -DWITH_FFMPEG_DECODER=ON

Neoclassic commented 1 month ago

@bradh Just in case it is something known.

Neoclassic commented 3 weeks ago

some more logs


[hevc_cuvid @ 0x5555558110c0] Format nv12 chosen by get_format().
[hevc_cuvid @ 0x5555558110c0] Loaded lib: libnvcuvid.so.1
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidGetDecoderCaps
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCreateDecoder
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidDestroyDecoder
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidDecodePicture
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidGetDecodeStatus
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidReconfigureDecoder
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidMapVideoFrame64
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidUnmapVideoFrame64
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCtxLockCreate
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCtxLockDestroy
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCtxLock
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCtxUnlock
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCreateVideoSource
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCreateVideoSourceW
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidDestroyVideoSource
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidSetVideoSourceState
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidGetVideoSourceState
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidGetSourceVideoFormat
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidGetSourceAudioFormat
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidCreateVideoParser
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidParseVideoData
[hevc_cuvid @ 0x5555558110c0] Loaded sym: cuvidDestroyVideoParser
[AVHWDeviceContext @ 0x555555813980] Loaded lib: libcuda.so.1
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuInit
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDriverGetVersion
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetCount
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGet
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetAttribute
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetName
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceComputeCapability
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxCreate_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxGetCurrent
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxSetLimit
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxPushCurrent_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxPopCurrent_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxDestroy_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemAlloc_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemAllocPitch_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemAllocManaged
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemsetD8Async
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemFree_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpy
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyAsync
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpy2D_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpy2DAsync_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyHtoD_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyHtoDAsync_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyDtoH_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyDtoHAsync_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyDtoD_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMemcpyDtoDAsync_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGetErrorName
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGetErrorString
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuCtxGetDevice
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDevicePrimaryCtxRetain
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDevicePrimaryCtxRelease
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDevicePrimaryCtxSetFlags
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDevicePrimaryCtxGetState
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDevicePrimaryCtxReset
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuStreamCreate
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuStreamQuery
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuStreamSynchronize
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuStreamDestroy_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuStreamAddCallback
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuStreamWaitEvent
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEventCreate
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEventDestroy_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEventSynchronize
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEventQuery
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEventRecord
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuLaunchKernel
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuLinkCreate
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuLinkAddData
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuLinkComplete
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuLinkDestroy
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuModuleLoadData
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuModuleUnload
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuModuleGetFunction
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuModuleGetGlobal
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuTexObjectCreate
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuTexObjectDestroy
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGLGetDevices_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGraphicsGLRegisterImage
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGraphicsUnregisterResource
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGraphicsMapResources
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGraphicsUnmapResources
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGraphicsSubResourceGetMappedArray
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuGraphicsResourceGetMappedPointer_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetUuid
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetUuid_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetLuid
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetByPCIBusId
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDeviceGetPCIBusId
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuImportExternalMemory
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDestroyExternalMemory
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuExternalMemoryGetMappedBuffer
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuExternalMemoryGetMappedMipmappedArray
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMipmappedArrayGetLevel
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuMipmappedArrayDestroy
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuImportExternalSemaphore
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuDestroyExternalSemaphore
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuSignalExternalSemaphoresAsync
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuWaitExternalSemaphoresAsync
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuArrayCreate_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuArray3DCreate_v2
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuArrayDestroy
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEGLStreamProducerConnect
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEGLStreamProducerDisconnect
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEGLStreamConsumerDisconnect
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEGLStreamProducerPresentFrame
[AVHWDeviceContext @ 0x555555813980] Loaded sym: cuEGLStreamProducerReturnFrame
[AVHWDeviceContext @ 0x555555813980] Calling cu->cuInit(0)
[AVHWDeviceContext @ 0x555555813980] Calling cu->cuDeviceGet(&hwctx->internal->cuda_device, device_idx)
[AVHWDeviceContext @ 0x555555813980] Calling cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device)
[AVHWDeviceContext @ 0x555555813980] Calling cu->cuCtxPopCurrent(&dummy)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cudl->cuCtxPushCurrent(cuda_ctx)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps8)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps10)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps12)
[hevc_cuvid @ 0x5555558110c0] CUVID capabilities for hevc_cuvid:
[hevc_cuvid @ 0x5555558110c0] 8 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192
[hevc_cuvid @ 0x5555558110c0] 10 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192
[hevc_cuvid @ 0x5555558110c0] 12 bit: supported: 1, min_width: 144, max_width: 8192, min_height: 144, max_height: 8192
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cvdl->cuvidCreateVideoParser(&ctx->cuparser, &ctx->cuparseinfo)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cudl->cuCtxPopCurrent(&dummy)
[hevc_cuvid @ 0x5555558110c0] Invalid pkt_timebase, passing timestamps as-is.
[hevc_cuvid @ 0x5555558110c0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc_cuvid @ 0x5555558110c0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc_cuvid @ 0x5555558110c0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc_cuvid @ 0x5555558110c0] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0
[hevc_cuvid @ 0x5555558110c0] Decoding VPS
[hevc_cuvid @ 0x5555558110c0] Main profile bitstream
[hevc_cuvid @ 0x5555558110c0] Decoding SPS
[hevc_cuvid @ 0x5555558110c0] Main profile bitstream
[hevc_cuvid @ 0x5555558110c0] Decoding PPS
[hevc_cuvid @ 0x5555558110c0] cuvid_output_frame
[hevc_cuvid @ 0x5555558110c0] cuvid_decode_packet
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cudl->cuCtxPushCurrent(cuda_ctx)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cudl->cuCtxPopCurrent(&dummy)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cudl->cuCtxPushCurrent(cuda_ctx)
[hevc_cuvid @ 0x5555558110c0] Calling ctx->cudl->cuCtxPopCurrent(&dummy)
[hevc_cuvid @ 0x5555558110c0] cuvid_output_frame
[AVHWDeviceContext @ 0x555555813980] Calling cu->cuCtxDestroy(hwctx->p.cuda_ctx)
Could not decode image: Decoder plugin generated an error: Unspecified: avcodec_receive_frame returned EAGAIN or ERROR_EOF```
Neoclassic commented 3 weeks ago

@bradh Its working with your fork "nvdev_merge_2" , the 6x8 tile heic image. But far too slow.

Problem looks like in initializing cuda for every tile. Can become super fast if that can be avoided somehow.

`time ./examples/heif-dec ~/ffmpeg_test/testfiles/LiveOff.HEIC out.png [istream] request_range 0 - 1024 [istream] request_range 24 - 3946 [istream] request_range 15119 - 17157 File contains 1 image [istream] request_range 17157 - 24447 [istream] request_range 24447 - 46783 GPU in use: Tesla T4 [istream] request_range 78025 - 103598 GPU in use: Tesla T4 [istream] request_range 46783 - 78025 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 103598 - 126917 [istream] request_range 126917 - 146390 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 146390 - 162298 [istream] request_range 162298 - 174934 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 174934 - 178765 GPU in use: Tesla T4 [istream] request_range 178765 - 209539 [istream] request_range 209539 - 256166 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 256166 - 297666 GPU in use: Tesla T4 [istream] request_range 297666 - 333953 GPU in use: Tesla T4 [istream] request_range 333953 - 350529 [istream] request_range 362678 - 375496 GPU in use: Tesla T4 [istream] request_range 350529 - 362678 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 375496 - 380664 GPU in use: Tesla T4 [istream] request_range 380664 - 407373 GPU in use: Tesla T4 [istream] request_range 407373 - 435430 [istream] request_range 435430 - 475399 GPU in use: Tesla T4 [istream] request_range 475399 - 500341 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 500341 - 521156 GPU in use: Tesla T4 [istream] request_range 521156 - 534960 GPU in use: Tesla T4 [istream] request_range 534960 - 548606 [istream] request_range 548606 - 554956 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 554956 - 579136 GPU in use: Tesla T4 [istream] request_range 579136 - 612623 GPU in use: Tesla T4 [istream] request_range 612623 - 639585 GPU in use: Tesla T4 [istream] request_range 639585 - 672260 GPU in use: Tesla T4 [istream] request_range 672260 - 692049 GPU in use: Tesla T4 [istream] request_range 692049 - 707221 GPU in use: Tesla T4 [istream] request_range 707221 - 718533 GPU in use: Tesla T4 [istream] request_range 718533 - 736781 [istream] request_range 736781 - 764323 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 764323 - 787729 GPU in use: Tesla T4 [istream] request_range 787729 - 809625 GPU in use: Tesla T4 [istream] request_range 809625 - 832233 GPU in use: Tesla T4 [istream] request_range 832233 - 852712 GPU in use: Tesla T4 [istream] request_range 852712 - 872658 GPU in use: Tesla T4 [istream] request_range 872658 - 886348 GPU in use: Tesla T4 [istream] request_range 886348 - 904055 GPU in use: Tesla T4 [istream] request_range 904055 - 922677 GPU in use: Tesla T4 [istream] request_range 922677 - 930679 GPU in use: Tesla T4 [istream] request_range 930679 - 937882 [istream] request_range 937882 - 946358 GPU in use: Tesla T4 GPU in use: Tesla T4 [istream] request_range 946358 - 971564 GPU in use: Tesla T4 [istream] request_range 971564 - 990224 GPU in use: Tesla T4 [istream] request_range 990224 - 1003954 GPU in use: Tesla T4 Written to out.png

real 0m17.878s user 0m5.555s sys 0m25.650s`

Neoclassic commented 3 weeks ago

static int nvdec_does_support_format(enum heif_compression_format format) Do we really need to check it everytime? Can it be cached ? Because i hardcoded the method to return 120 and now time is down to 8seconds