raspimjpeg broken by Dec30-2016 commit on older Pis

roberttidey commented 7 years ago

raspimjpeg, which is a camera interface using MMAL, has got a problem on older Pis (0,A,B,early Pi2) when any rpi-update is done using a commit from Dec30-2016 or later. It is fine on Pi3 even using latest version.

In normal use it is getting callbacks to get a sequence of images to use for a mjpeg stream. On the Pis with a problem this works for a while but then the callbacks seem to stop after a period ranging from a few seconds to a few minutes.

GPU memory is set to 128MB.

6by9 commented 7 years ago

A bit more detail required. What bitrate? What resolution (although that shouldn't matter)? Confirm that you are using MJPEG as the codec?

A check against the 30th Dec release https://github.com/Hexxeh/rpi-firmware/commit/87bf11faa6ccefb7969821bff8cbc44f72be7414 does point to https://github.com/raspberrypi/firmware/issues/698 That should have near zero effect to you if using MJPEG, but H264 would be combining the SPS and PPS into a single CODECCONFIG buffer. You can disable the option again via MMAL_PARAMETER_MINIMISE_FRAGMENTATION. If that does fix the issue then we can investigate further. Can you reproduce using raspivid and MJPEG encoding at an appropriate resolution/bitrate? (raspivid -cod MJPEG)

roberttidey commented 7 years ago

raspimjpeg creates 2 video ports and 1 image port to the camera. In addition it creates 2 jpegencoders and 1 h264 encoder.

One video port is used all the time to get frames which are jpegencoded to give the data for mjpeg previewing. Typically 512 x 288 The second video port is used for video recording (h264) when enabled for a real recording and vector data is also extracted during callbacks on this. Typically 1920 x 1080 The image port is used for grabbing still images on demand which then get jpegencoded. Typically 2592 x 1944

The default h264 bit rate is set to 17000000 but can be user selectable. When fault was observed it was set to default.

I could try setting the MMAL_PARAMETER_MINIMISE_FRAGMENTATION. I'm not sure the structure it should be set. Can you give an example?

roberttidey commented 7 years ago

I am guessing it should be something like status = mmal_port_parameter_set_boolean(camera->output[0], MMAL_PARAMETER_MINIMISE_FRAGMENTATION, 0);

but I'm not sure which camera ports I should apply it to or whether it should be the camera control.

JamesH65 commented 7 years ago

I just ran raspivid for some time outputting MJPEG at 640x480, result was a 800MB file, no issues seen. Only has one JPEG encoder running though.

Does the issue occur when you are using the still grab, or does it still fail even when not using the second encoder?

popcornmix commented 7 years ago

@roberttidey Can you give commmand line parameters for raspimjpeg that show the problem?

6by9 commented 7 years ago

It is a video_encode component parameter, not camera.

status = mmal_port_parameter_set_boolean(video_encoder->output[0], MMAL_PARAMETER_MINIMISE_FRAGMENTATION, MMAL_FALSE);

6by9 commented 7 years ago

Can you also confirm that I understand you correctly? You have one video_encode component (doing H264), and are using image_encode for the jpeg encoding?

popcornmix commented 7 years ago

I've tried the commands here: https://github.com/silvanmelchior/RPi_Cam_Web_Interface/issues/230#issue-124659038

and raspimjpeg is writing a file to /dev/shm/mjpeg/cam.jpg on a Pi2. Does that work for you, @roberttidey?

roberttidey commented 7 years ago

popcornmix. Yes that's right to see the process running. It continuously updates cam.jpg normally. This is from a video call back linked to jpegencoder. When a recording is done then a second video call back is linked to h264encode to record a video.

It runs OK for me on a Pi3 and probably is OK on later Pi2 with updated processors. Where we see the problem is on earlier Pi cpus like Zeros, B's and I suspect early Pi2. It does actually run for a bit on those but stops after a period.

6by9: Thanks for clarification on parameter setting. I can try that.

popcornmix commented 7 years ago

I'm testing on an old Pi2 (BCM2836 processor). I'll try on a Pi 1.

roberttidey commented 7 years ago

Thanks. I am retesting on a Pi Zero. I notice kernel was bumped to 4.4.44 yesterday, so I'll try first with that

popcornmix commented 7 years ago

Been running for a few minutes on Pi1. /dev/shm/mjpeg/cam.jpg is still updating. @roberttidey do you have any non-default settings in config.txt?

roberttidey commented 7 years ago

Just using defaults.

So far my test on 4.4.44 has been going for 4 minutes without a problem with motion detect turned on which activates getting the vector data back as well.

I first checked this out after a user with 4 camera (early Pi2) reported a problem after a rpi-update a few days ago. I couldn't replicate on a Pi3 but my Pi Zero cameras did show the problem and allowed me to see it was broken by the Dec 30 update. Reverting the update also cleared the user's problem.

I'll let this test run a bit lobger then try reverting to one that showed the problem.

popcornmix commented 7 years ago

I think it is unlikely the 4.4.44 has fixed anything. It sounds like a firmware issue. There were two main changes in Dec 30 firmware, related to MMAL and turbo mode.

I'd expect the MMAL change to affect all Pi's, whereas the turbo change may have been more likely to affect only certain Pi's (e.g. Pi2 and Pi3 have different core frequencies in turbo mode).

Pi1 is still running after 15 minutes.

roberttidey commented 7 years ago

Hmmm. I can't reproduce at the moment with an earlier firmware which was consistently failing before.

Only difference I can think of is that this Pi Zero test is on an outside camera with an effectively black screen at the moment (night) so that might be affecting things.

I will test again tomorrow when daylight returns and update.

popcornmix commented 7 years ago

My Pi1 is still producing new images and has been running for about an hour. Let me know if you discover what makes it fail quickly.

6by9 commented 7 years ago

If it is genuinely the 30th Dec firmware that causes the issue, then it is most likely to be around handling of header bytes somewhere.

I had queried whether a video_encode component was using the MJPEG codec, or is it solely an image_encode component doing JPEG and concatenating the output frames? The firmware MJPEG codec != JPEG codec, so narrows down the relevant place to look.

One thought is that the MINIMISE_FRAGMENTATION option was never used with motion vectors (mainly as that was a Gordon special on the Pi branch). That may be relevant, although reading through the VC code it all looks sane. None of that should be affected by which spec of Pi this is running on as the VC4 side is identical on all of them. I would be more concerned if something is using the video_encode MJPEG codec, as that is known to have issues, and the speed with which the ARM can return buffers may well have a bearing on whether it is happy or not. (However MJPEG shouldn't be troubled by the minimise fragmentation change, so that muddies the water somewhat).

roberttidey commented 7 years ago

The problem has now re-appeared without me changing anything. The call backs are stopping after about 2 minutes of working. A watchdog kicks in and restarts it. So I think it must need some significant content to encode to trigger the problem as last night with a black screen I did not see a failure.

The version I had reverted to was 4.4.42

What is needed to trigger the problem is to have the video stream and h286 encoder active as well as the normal preview sequence of images. This happens when either the pre-trigger capture buffer is set up or motion detection is active. When the pre-trigger buffer is active then video is recorded all the time to a RAM circular buffer so that recordings can include a few seconds of data before real recording is triggered. The config value in /etc/raspimjpeg (video_buffer 2000) would set the buffer to be about 2 seconds long and triggers the problem for me.

I have built a version of raspimjpeg which does set the fragmentation false so I'll give that a try.

JamesH65 commented 7 years ago

Non-black image = more JPEG data. I wonder if there is a race condition being caused by hitting some processing bandwith limit on the lower spec Pi's. If related to the size of the data, perhaps a buffer is simply failing to be in the right place at the right time. Given all that's going on in this use case, there is a lot of data being thrown around.

On 24 January 2017 at 11:21, roberttidey notifications@github.com wrote:

The problem has now re-appeared without me changing anything. The call backs are stopping after about 2 minutes of working. A watchdog kicks in and restarts it. So I think it must need some significant content to encode to trigger the problem as last night with a black screen I did not see a failure.

The version I had reverted to was 4.4.42

What is needed to trigger the problem is to have the video stream and h286 encoder active as well as the normal preview sequence of images. This happens when either the pre-trigger capture buffer is set up or motion detection is active. When the pre-trigger buffer is active then video is recorded all the time to a RAM circular buffer so that recordings can include a few seconds of data before real recording is triggered. The config value in /etc/raspimjpeg (video_buffer 2000) would set the buffer to be about 2 seconds long and triggers the problem for me.

I have built a version of raspimjpeg which does set the fragmentation false so I'll give that a try.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/firmware/issues/728#issuecomment-274777382, or mute the thread https://github.com/notifications/unsubscribe-auth/ADqrHeGGChZu0ZxPYVNDzmPT5ecqFkATks5rVd67gaJpZM4Lq3v4 .

-- James Hughes Principal Software Engineer, Raspberry Pi (Trading) Ltd

roberttidey commented 7 years ago

So far been running on 4.4.42 with MINIMISE_FRAGMENTATION set false for about 30 minutes without a problem. Without the MINIMISE_FRAGMENTATION it was failing about every 2 minutes.

On the CPU load side raspimjpeg only uses about 5% cpu on a Pi Zero (about 1% on a Pi3). Although there is quite a lot going on, most of the heavy lifting is on the GPU I think.

roberttidey commented 7 years ago

I have added a config setting (minimise_frag defaulting to 0) to raspimjpeg. This will allow me to turn this setting back on easily.

I did a rpi-update to latest and it seems to be running fine with MINIMISE_FRAGMENTATION off on a Pi Zero.

6by9 commented 7 years ago

The only way to get an answer on this is to log when buffers come back with the length and flags field. From what you've said it's only the video encoder output port that is of interest.

roberttidey commented 7 years ago

OK I'll give that a go

roberttidey commented 7 years ago

mmallogfile2.zip I have added optional logging to a ram file on each h264 callback. I switched the minise fragmentation back on and caught a crash.

The attached file is the log. The format is Seconds Nanoseconds h264 buffer->length buffer->flags

The crash happened around line 2754 as can be seen from the second counter which freezes at that point. The raspimjpeg process autorestarts itself a few seconds later.

6by9 commented 7 years ago

Thanks for the logs. It appears you have an intra-I-period of 12 frames.

Line 2754 is the end of a correct run of 11 P-frames, so something is failing on the next I. That's probably enough of a hint for now. My brain still hasn't come up with a reason why this would only affect the slower Pi models though.

Something very strange going on at the very end of that log too. Again intra-period of 12, but then at the end we get 17 P-frames after the I (split between 846306967 and 850178926).

A general comment for a performance gain, increasing the buffer size to 128kB would allow all encoded frames to fit inside a single buffer, rather than the I-frames being fragmented over 2 buffers every time. It just saves an extra round trip to the GPU and back. Allocating more than one buffer on the output port would also allow the data to flow more freely, particularly as you're creating at least 2 output buffers (one vectors, one data) for each frame.

roberttidey commented 7 years ago

Previously the buffer size had been set to 60000. I increased that to 131072.

I ran again with MINIMISE_FRAGMENTATION on (Pi Zero). It ran for much longer this time and I thought at first that maybe this was another 'cure'. But it did crash out after 10 minutes.

Log is attached. Problem is now at line 28873.

The callback events does seem smoother as you suggested but I don't see any lengths greater than 65536. Is setting it higher than this doing any good? mmallogfile4.zip

6by9 commented 7 years ago

If the lengths are still set <=65536 then something is wrong in the setup.

I'd need to look, but I suspect that buffer_size hasn't been passed down to the component. I thought setting it just before the pool is created or port enabled was sufficient, but your comment implies it needs to be set and then mmal_port_format_commit called. I may get a chance to confirm tonight. (I may be getting confused with buffer_num which the component really doesn't care about too much, it just keeps a linked list of whatever is passed in).

roberttidey commented 7 years ago

I think my change to set the size wasn't right. I am having another go.

roberttidey commented 7 years ago

I now have it running with size set correctly and am seeing some returns around 75000. I have set it with fragmentation back on and will post a log it it crashes.

6by9 commented 7 years ago

Thank you. It is quite possible that there is a lurking issue with the minimise fragmentation option if the output frame is bigger than the buffer size.

As history, the original reason for the code was that the codec has an internal circular buffer and it produced a pair of buffers at the wrap point. Android can't handle fragmented H264 frames at all, so that would cause it to fall in a big heap. MINIMISE_FRAGMENTATION glued the two parts from either side of the wrap point into a single output buffer. As it couldn't handle any fragmentation it meant that we ran with 768kB (IIRC) buffers to avoid 1080P frames exceeding the buffer size even under situations where rate control got totally caught out and used an inappropriate QP. It was never used in anger in a situation where we were expecting one encoded frame to be split but wanting to ensure all except the last buffer were full even if they hit the wrap point. I do have recollection of testing that case though, but it may not have been

Having read that second log in more detail, again it has stalled after a correct intra-period of P-frames, so it is again on producing the I. I just wonder if the issue is if the I wraps the end of the buffer, and something is checking flags incorrectly (=, rather than treating as a bitmask).

My other minor observation is that there is a motion vectors frame with each I-frame. That looks wrong as I'm fairly certain the CME (which produces those motion vectors) isn't run on key frames. I have a recollection of looking at that, but it may have been that the results are just always returned as 0 for those frames, not a 0 length buffer. You may be able to confirm whether my memory is defective on that based on your processing of the data, rather than me trawling through code and creating a new test case.

roberttidey commented 7 years ago

So far been running for over 90 minutes without a problem, so it looks like setting the buffer_size does avoid the problem. I have made it configurable and set to 131072. Previously it had been set to output[0]->buffer_size_recommended (65536).

It is strange it doesn't show on the newer Pis; maybe some timing criticality introduced by the fragmentation.

I attach a bit of 2 log files (problem free) for reference on the buffer size change. mmallogfile6 was done with 1296x972 and mmal8 with 1920x1080. The latter show buffers around 95000.

mmallogfiles.zip

6by9 commented 7 years ago

The only thing I can think of is that there is something with the extra fraction of a second that it takes over returning the buffer to the GPU. There's a several MB internal buffer on the output of the codec that should smooth out any of that though. And generally your logs don't show any significant jitter in the timestamps - there's the expected frame time between pairs, so most of the time it's idle waiting for the codec to finish.

I'll try to pick through the code with a fine tooth comb to see if there is a dodgy path, but it may take me a day or so to find the time.

roberttidey commented 7 years ago

On the motion vectors question: raspimjpeg sets MMAL_PARAMETER_VIDEO_ENCODE_INLINE_VECTORS on the encoder output then checks for flag MMAL_BUFFER_HEADER_FLAG_CODECSIDEINFO in the callback to determine what to process.

Maybe I am misreading the log but it looks to me like there is no flag 4 frame after the flag 12 frame. Isn't that what you expect?

At the moment there is no real checking on the raspimjpeg side for the vector data consistency but I haven't seen any anomalies there.

Edit: I think I am misreading the log. It looks to me like the 132 are the vectors with a consistent buffer size. So the 12 is indeed followed by a vector

6by9 commented 7 years ago

4 = MMAL_BUFFER_HEADER_FLAG_FRAME_END
8 = MMAL_BUFFER_HEADER_FLAG_KEYFRAME
128 = MMAL_BUFFER_HEADER_FLAG_CODECSIDEINFO

So:

8 = MMAL_BUFFER_HEADER_FLAG_KEYFRAME (ie non-final part of I-frame)
12 = MMAL_BUFFER_HEADER_FLAG_KEYFRAME | MMAL_BUFFER_HEADER_FLAG_FRAME_END (end of an I-frame)
4 = MMAL_BUFFER_HEADER_FLAG_FRAME_END (end of a non-I-frame)
132 = MMAL_BUFFER_HEADER_FLAG_CODECSIDEINFO | MMAL_BUFFER_HEADER_FLAG_FRAME_END (motion vectors)

Most of the I-frames are coming in >64kB hence you get the full buffer without FRAME_END, and then a partial buffer with FRAME_END.

roberttidey commented 7 years ago

Thanks. So with bigger buffer there are no more 8 partial I-frames. I also saved a bunch of vector data which raspimjpeg can do. Every 60th one was full of zeros. I think that is every fifth I-frame. The data was also changing on each call.

6by9 commented 7 years ago

I've done a bit of digging in to this - it looks like the logic is flawed and the fragementation code is holding on to the buffer even though it is full. I get it failing pretty quickly on a CM3 with raspivid -t 0 -w 1920 -h 1080 -ih -g 12 -b 17000000 -n -o file.h264, so I don't quite see why you've only been seeing it on slower Pis. Perhaps I'm just being "lucky".

I need to work through all the combinations of data passing through the encoder to make sure all potential paths do something sensible, but hopefully can sort a fix today.

6by9 commented 7 years ago

OK, I think that I nearly have a fix.

The issue was linked to the wrap point in the FIFO, but only if the frame had more than buffer_size bytes before the wrap point, and then some after as well. The bit before the wrap wasn't flagged as completing the frame (because it didn't), but that was the only trigger for returning the buffer. If you had more than a buffer's worth of data before the wrap then it couldn't clear it and didn't return it, resulting in a stall.

I've got one last bit to fix (currently the second fragment of the buffer gets a duplicate of the timestamp instead of TIME_UNKNOWN), but it should be in the next rpi-update (probably early next week as there was one yesterday). Apologies that this got enabled without the level of testing that it probably should have had.

roberttidey commented 7 years ago

Great stuff. Thanks for digging into this. As soon as I see it available on rpi-update I'll give it a go on a Pi Zero as I can control the fragmentation control now without changing code.

6by9 commented 7 years ago

Sorry, bad news. I've just seen some odd results with the timestamps so need to investigate further before the fix can be merged.

6by9 commented 7 years ago

Issue hopefully resolved and now merged internally. It should be in the next rpi-update release.

popcornmix commented 7 years ago

@6by9's fix should be in latest rpi-update firmware if you want to test.

roberttidey commented 7 years ago

Great. Thanks for this. I'll update a Pi Zero and test tomorrow (14th) with minimise fragment on and the default buffer size.

roberttidey commented 7 years ago

Updated Pi Zero with rpi-update to 4.4.48. Set camera going with minimise fragmentation and default h264 buffer size of 65536. Previously this config caused lock up within about 10 minutes.

It's been running under this config for 100 minutes. So I think the update fixed the problem.

I'll normally now default to minimise fragmentation and up h264 buffer size to 131072 as this did make callbacks smoother.

Thanks again for response for investigating and resolving this.

raspberrypi / firmware

raspimjpeg broken by Dec30-2016 commit on older Pis #728