Request: hardware accelereted decode H264 stream from network camera

dhruva-pr commented 7 years ago

Trying to connect a network Camera with input H264 stream, all works well, however the CPU usage is high. Anything beyond two cameras is streching the CPU. Is there a way to use Hardware decoder for the input stream?. The MJPEG stream is far easier on the CPU but not all IP cameras support MJPEG streams.

Thanks

jasaw commented 6 years ago

I don't have an IP camera with H264 stream, so can't test it myself, but I think can be implemented quite easily on a Raspberry Pi.

We already have ffmpeg compiled with h264_mmal. To use it to decode H264, we could try adding something like rtsp_data->codec_context->hwaccel = ff_find_hwaccel(rtsp_data->codec_context->codec->id, rtsp_data->codec_context->pix_fmt); to netcam_rtsp.c: netcam_rtsp_open_codec function, probably after the avcodec_parameters_to_context call.

Another way as a quick hack is to move the h264_mmal up the codecs list in ffmpeg allcodecs.c: register_all function, i.e. move REGISTER_DECODER(H264_MMAL, h264_mmal); above REGISTER_DECODER(H264, h264);

Again, I haven't tested any of these methods, so they may not work at all.

jasaw commented 6 years ago

@dhruva-pr I did the quick hack method I suggested earlier, but don't have IP camera to test. Are you able to install my forked experimental version to test?

dhruva-pr commented 6 years ago

thanks @jasaw , it works and shows a significant decrease in CPU as long as the input resolution is the same as output resolution. However when using multiple IP cameras with H264 streams there seems to be a stability issue. Unsure if it is availability of GPU. Camera's are constantly getting disconnected & recovers only after system reboot. Will run it for a day and post findings.

jasaw commented 6 years ago

It may have run out of GPU memory. What's your configuration? Are you encoding the video stream back into H264 on the same device? Which Raspberry Pi model are you using? You could try increasing the GPU memory to see if it helps. If you can SSH into the device, you could try running gpu_stats script that I've added into my fork. It basically keeps poking the GPU for its available memory.

jasaw commented 6 years ago

@dhruva-pr Any update on your progress on hardware accelerated H264 decoding? I'm curious to find out if hwaccel H264 decoding works. :-)

dhruva-pr commented 6 years ago

Hardware decoding does work. Shows significant reduction in cpu usage. yet to test on reliability with 4 h264 streams(need to test with higher GPU memory). should try over the weekend. will post findings.

gururise commented 6 years ago

Any updates on hwaccel H264 decoding? I'm trying to determine whether its better to use a Odroid-XU4 (presumably w/o hwaccel) or a Raspberry Pi 2 (with hwaccel) for handling 4-5 external cams.

jasaw commented 6 years ago

@gururise H264 hwaccel decoding has not been implemented on MotionEyeOS. You are better off getting an Odroid-XU4.

xu4user commented 6 years ago

How do I implement this? What file or files need to me modified? To use it to decode H264, we could try adding something like rtsp_data->codec_context->hwaccel = ff_find_hwaccel(rtsp_data->codec_context->codec->id, rtsp_data->codec_context->pix_fmt); to netcam_rtsp.c: netcam_rtsp_open_codec function, probably after the avcodec_parameters_to_context call.

Another way as a quick hack is to move the h264_mmal up the codecs list in ffmpeg allcodecs.c: register_all function, i.e. move REGISTER_DECODER(H264_MMAL, h264_mmal); above REGISTER_DECODER(H264, h264);

jasaw commented 6 years ago

@xu4user Keep in mind that this is a hack that I have not tested. You are free to test it if you want. Here's what you need to do:

Git clone motioneyeos repository if you haven't already done do. You'll need to compile your own motioneyeos image for this.
Download this patch file: https://github.com/jasaw/motioneyeos/blob/mods/package/ffmpeg/0002-prefer-h264-mmal-decoder.patch and place it motioneyeos/package/ffmpeg directory.
Build motioneyeos image. See wiki pages on how to build.
Flash the image onto your SD card and test.

shahzadgodil2 commented 5 years ago

I don't have an IP camera with H264 stream, so can't test it myself, but I think can be implemented quite easily on a Raspberry Pi.

We already have ffmpeg compiled with h264_mmal. To use it to decode H264, we could try adding something like rtsp_data->codec_context->hwaccel = ff_find_hwaccel(rtsp_data->codec_context->codec->id, rtsp_data->codec_context->pix_fmt); to netcam_rtsp.c: netcam_rtsp_open_codec function, probably after the avcodec_parameters_to_context call.

Another way as a quick hack is to move the h264_mmal up the codecs list in ffmpeg allcodecs.c: register_all function, i.e. move REGISTER_DECODER(H264_MMAL, h264_mmal); above REGISTER_DECODER(H264, h264);

Again, I haven't tested any of these methods, so they may not work at all.

Where exactly this file exist? allcodecs.c:

jasaw commented 5 years ago

@shahzadgodil2 It's in ffmpeg package.

shahzadgodil2 commented 5 years ago

Where exactly is fmpeg package?

jasaw commented 5 years ago

@shahzadgodil2 Are you building motionEyeOS from source? If you are, then you can try the prefer h264_mmal hack by downloading this patch file https://github.com/jasaw/motioneyeos/blob/mods/package/ffmpeg/0002-prefer-h264-mmal-decoder.patch and placing it in motioneyeos/package/ffmpeg directory, then you build the motionEyeOS image. If you are running Raspbian, you'll need to uninstall the default ffmpeg package, download ffmpeg source, apply the prefer-h264-mmal patch, compile, and install.

shahzadgodil2 commented 5 years ago

Will creating MotionEyeOs will work on Windows? I have Windows computer

I have downloaded as zip of this whole repository https://github.com/ccrisan/motioneyeos

And I copied all files in USB "0002-prefer-h264-mmal-decoder.patch" in folder E:\motioneyeos-master\package\ffmpeg

And then created image file from USB using Win32DiskImager

And then create SD card using that newly created image

@shahzadgodil2 Are you building motionEyeOS from source? If you are, then you can try the prefer h264_mmal hack by downloading this patch file https://github.com/jasaw/motioneyeos/blob/mods/package/ffmpeg/0002-prefer-h264-mmal-decoder.patch and placing it in motioneyeos/package/ffmpeg directory, then you build the motionEyeOS image.

Please confirm

shahzadgodil2 commented 5 years ago

I am using MotionEyeOS image. And not from the source https://github.com/ccrisan/motioneyeos/releases/download/20181209/motioneyeos-raspberrypi3-20181209.img.xz

shahzadgodil2 commented 5 years ago

I have created an image file for so My SD card is 16 GB

Now I just found that your other repository have hardware decoding implementing https://github.com/jasaw/motioneyeos/releases

My data is not important and I can reflash it Will installing this build will resolve hardware decoding?

Also what are the steps for enabling hardware decoding?

jasaw commented 5 years ago

My releases have hardware decoding enabled, but be warned that it is highly experimental. I haven't fully tested it. I have also done some modifications to the build that is tailored to my own requirements. Read the release notes if you want to know what you are actually installing. With all that said, if you still want to go ahead, you can try release 20180602. Don't use the RTSP FNC version as that does not have hardware decoding.

To answer your earlier questions:

If you want to compile MotionEyeOS from source, you have to do it on a Linux machine.
Hacking ffmpeg as mentioned in this discussion can only be done if you compile from source. It needs to be implemented properly, but I'm not sure how yet.
Using my release will always enable hardware decoding. It can't be turned off. There's nothing you need to do to enable it, it's enabled by default.

shahzadgodil2 commented 5 years ago

Hello,

I am getting 15 fps in Raspberry PI 3 Model B+ in this new build

Resolution: `1920x1080 Frame: 25 fps

GPU states:

throttled=0x0, malloc=8M, reloc=206M throttled=0x0, malloc=8M, reloc=206M

jasaw commented 5 years ago

@shahzadgodil2 How many frames per second did you get previously? Did you notice any difference in CPU load? What's your configuration? Are you encoding the video on the same RPi as well?

shahzadgodil2 commented 5 years ago

I was getting 8 to 12 frames in an old version

Configuation:

Resolution: `1920x1080 Frame: 25 fps Video quality: 100% Camera just one: Xiomi Camera: Da Fang Camera With Dafang Hack

I am getting 25 fps in 1920x1080 in other software also

What exactly RPi is?

Thanks

jasaw commented 5 years ago

@shahzadgodil2 Sorry, I don't quite follow. Which other software are you getting the 25 fps?

RPi is short for Raspberry Pi.

shahzadgodil2 commented 5 years ago

I am getting 25 fps in Shinobi software in my Rasberry PI 3 B+

I am getting 15 fps in MotionEye, your new software in same Rasberry PI 3 B+

jasaw commented 5 years ago

The below is copied directly from Shinobi guide:

No transcoding. It's simply too CPU intensive even with hardware-accelerated h264_omx encoder. For any recording/streaming you'll need to set the video codec to "copy" (or possibly the jpeg API).
For any decoding you'll need to use the hardware-accelerated h264_mmal codec. Without specifying this codec there will be too much CPU usage. Using MMAL ensures that the heavy lifting of deciding the h264 stream is done on the GPU.

Let's take a look at the video processing pipeline in Shinobi:

H264 encoded stream (CPU) +--> h264_mmal decoding (GPU) --> possibly motion detection (CPU)
                          +--> storage (CPU)

The GPU is only tasked with decoding the H264 video stream from an IP camera, so you get higher throughput. Disadvantage is, you can't add text or timestamp into the recorded video. This is exactly the same behaviour as motion's passthrough-recording feature (motion is the underlying software in motionEyeOS).

This is what we have in motionEyeOS:

H264 encoded stream (CPU) --> h264_mmal decoding (GPU) --> motion detection (CPU) --> draw text in video stream (CPU) --> h264_omx encoding (GPU) --> storage (CPU)

The GPU has to decode then encode the video stream. That's double the amount of raw video going between GPU and CPU, so we are hitting the memory bandwidth, thus lower throughput. Advantage of this setup is the video can be edited in anyway we like, e.g. add timestamp, or draw red box around detected motion area.

Both methods have their pros and cons. Which one to choose is up to your requirements.

jasaw commented 5 years ago

Motion now supports selecting a decoder for netcam. https://github.com/ccrisan/motioneye/issues/1519

motioneye-project / motioneyeos

Request: hardware accelereted decode H264 stream from network camera #1228