techyian / MMALSharp

C# wrapper to Broadcom's MMAL with an API to the Raspberry Pi camera.
MIT License
195 stars 33 forks source link

Image capture delivers empty result and finally goes to endless loop #136

Closed Spontifixus closed 4 years ago

Spontifixus commented 4 years ago

Hi,

first of all thank you for this great library! I have encountered a problem actually - and it might well be that I have misunderstood something here or used the library in a wrong way!

The scenario: I try to capture an image every 5 seconds and store this to an Azure Blob storage. In a first step I went with the time lapse feature, but ditched that, because I could not find a way to resize the image before sending it to the capture handler. For this approach I created a dedicated AzureStorageCaptureHandler that was based on the MemoryStreamCaptureHandlerwhile implementing the IFileStreamCaptureHandler interface.

So I went and implemented my own mechanism. You can find a very stripped down implementation in the Spontifixus/mmalsharp-demo repository. The setup is derived from the "resizer component" example. In pseudo code it looks this way:

ConfigureCamera
For i = 0 To 100
    memoryStreamHandler = new MemoryStreamCaptureHandler()
    CaptureImage(memoryStreamHandler)
    StoreImage(memoryStreamHandler)
    memoryStreamHandler.Dispose()
Next

Expected behavior: I expect the code to store 100 images, without capturing empty images, and without going into an endless loop.

Actual behavior The application produces empty images every few captures and finally goes into an endless loop because a buffer cannot be released. Here's a log of the test run: debug.log, with the following interesting parts:

Spontifixus commented 4 years ago

I have had a short look at the sources - and a reason for this endless loop could be the call to the Dispose-method at the end of the MMALBufferImpl.Release()-method. I could imagine that during the disposal somehow a release is triggered once again. I don't have the means to debug this on my Raspi unfortunately...

techyian commented 4 years ago

Hi,

Sorry in the delay getting back to you. I initially had a look into this using Mono and wasn't able to reproduce it. I then moved onto .NET Core 3.1 and immediately I could reproduce the issue. It seems quite intermittent though even on .NET Core 3.1. I suspect it's going to be a difficult one to figure out.

I was wondering whether you could please set MMALCameraConfig.Debug = true; at the beginning of your application and see if you can reproduce it - the verbose logs may help track down what's happening here as they include buffer metadata. If you struggle to reproduce with that set, it may be a timing issue and I may need to look into re-adding the thread locks in the native callback methods.

Spontifixus commented 4 years ago

With the debug output enabled reproducing the issue was a lot harder than without. I had to run the sample app several times, but finally it "worked". Please see the debug.log for the failing image capture.

techyian commented 4 years ago

I've just committed a change which I hope will fix this. The change prevents a buffer being sent to an output port if we have reached the end of stream. I've not been able to reproduce the issue again with this change in place, could you have a test and see if it fixes it for you please? You will need to clone the dev branch.

Spontifixus commented 4 years ago

That seems to fix the defect, indeed. Yet occasionally (though by far not as often as before) I still see failed captures (i.e. the resulting stream being empty).

techyian commented 4 years ago

Ok well it seems like we're making progress here so that's good. With regards to the stream being empty, have you observed the logs whilst the verbose logging is enabled to see if the buffer header has any metadata to suggest why the stream may be empty? Are you able to reproduce the stream being empty with your example application you linked to?

Spontifixus commented 4 years ago

Hi, thanks again for your support! Here's a log file with at least one failed capture. You can find the relevant log entries at lines 3535ff: debug.log

There's a difference between successful and failed captures. Here's a successful capture:

   Buffer Header 
  ---------------- 
  Length: 59082 
  Presentation Timestamp: 299252705 
  Flags: 
  MMAL_BUFFER_HEADER_FLAG_FRAME_END 
  ---------------- 

And this is the header of the failed capture:

   Buffer Header 
  ---------------- 
  Length: 0 
  Presentation Timestamp: 0 
  Flags: 
  MMAL_BUFFER_HEADER_FLAG_EOS 
  MMAL_BUFFER_HEADER_FLAG_COMPLETEFRAME 
  ---------------- 

I can see that the length is zero and the flags signal something different - but I lack the knowledge about both the hardware interface and the JPG-encoding to make any sense of it...

techyian commented 4 years ago

That's certainly a bit bizarre. If you're using the ImageEncoderComponent connected to the camera's still port then MMAL_BUFFER_HEADER_FLAG_EOS should not be seen in the buffer's metadata, MMAL_BUFFER_HEADER_FLAG_FRAME_END signifies EOS in this instance. Can you confirm that you can reproduce this behaviour with the example application you've got here? Unless you're doing anything strange with your setup I doubt it's this library causing that behaviour, especially considering the image taken after is successful.

Spontifixus commented 4 years ago

The behavior was reproduced using that exact code - except that I modified the logging a bit to be able to find the failed capture more easily.

Usually this behavior poses not much of a problem, due to the fact that I capture the next image immediately if one capture went wrong.

In the real world this code does not run in a for-loop though but rather in a while(isCaptureActive)-loop, and that stopped working after a few hundred captured images. I don't have detailed information as of now, but I will update this issue should I come to the conclusion that the reason is at MMALSharp or maybe on even lower parts of the system.

techyian commented 4 years ago

Ok that's fine, I'll leave the ticket open and will wait to hear back from you. I also noticed that the image sizes you're producing are very small (around 50kb), what resolution are you setting? I'd be interested to know whether a larger resolution also causes this problem.

Spontifixus commented 4 years ago

I shoot the images with 1024 x 768 px, with the quality set to 90. The reason for the small image size is that all images are captured at night, and without a proper configuration (ISO and shutter speed) the images are completely black. Once I have the configuration up and running I will do another night's shooting and find out whether there is a problem or not here.

Spontifixus commented 4 years ago

Here's update of the performance front: In last nights run, The camera shot 1154 images without a glitch. Every twelve images I reset the pipeline, captured an image with default settings to measure the lightness of the image and the reconfigured the pipeline again.

After that 1154th image, things went south pretty fast. Shortly before I shut down the software accidentally (by pressing CTRL-C to copy console output...) it had captured 4272 images successfully, 503 attempts failed. The failure rate increased throughout that time in a way that for the first minutes the reset every 12 images went well. After a few more minutes failures happened so often that the ìmageCount % 12 == 0` condition was never reached again, and thus the pipeline was never reset.

I restarted the whole application (with calling MMALCamera.Instance.Cleanup() of course) and things started as badly as they ended. So probably the problem does not lie in the MMALSharp library...

I'll try to create a small app using my implementation tonight , so you can run the same code on your system if you want...

techyian commented 4 years ago

Thanks, that would help. I'd be interested to see if there's a memory issue going on somewhere which might be causing this, possibly on the GPU?

For checking GPU memory usage you could use:

sudo vcgencmd get_mem reloc for the relocatable heap usage or sudo vcgencmd get_mem malloc for the malloc heap.

You can of course check RAM usage too by using free.

How much memory have you assigned to the GPU? I'm not sure what Pi version you have, but typically I assign 256 MB to the GPU just to be on the safe side.

Spontifixus commented 4 years ago

Regarding the memory: I ran free -m -t regularly and there seems to be no memory problem with the main memory. Regarding the GPU memory: I have no idea yet (but will find out) how to assign memory to the GPU. I operate a Raspi 3B with a V2.1 NoIR camera module.

I will check the GPU memory usage during the next run tonight. Also I considered implementing hardware accelerated resizing (as described in the wiki) - as that could maybe help, too, though I do not really have a performance problem time-wise with the resizer component.

techyian commented 4 years ago

You can change the amount of memory allocated to the GPU via sudo raspi-config. Once you're in the tool navigate to:

Advanced Options -> Memory Split

In there you can assign a value in MB and then you'll need to do a restart. By default 64MB is assigned to the GPU and sometimes you can require a little bit more when using the camera module.

Spontifixus commented 4 years ago

I have committed a new version to the master-branch of the demo repository. It logs on Info-level, mmal-logging can be enabled using the parameter -mmallog.

The software is right now running on my raspi, the GPU memory status is as follows:

Before program start

$ sudo vcgencmd get_mem malloc 
malloc=7M
$ sudo vcgencmd get_mem reloc
reloc=100M`

After the program started

$ sudo vcgencmd get_mem malloc
malloc=4M
$ sudo vcgencmd get_mem reloc
reloc=62M

While running, the memory consumption remains constant for now...

Spontifixus commented 4 years ago

Next results: So the program ran another night, and the symptoms were the same. Failure rate increased, and even a restart of the application did not solve the problem. The fresh instance of the application instantly had a high failure rate. The memory usage remained the same as the evening before. I will try today and the next night with 256 MB of GPU memory. I will keep you posted...

UPDATE: Whoa. This time the failure rate was high right from the start. Almost every second image failed. Memory readings were reloc=228M and malloc=7M before run, and reloc=192M and malloc=4M during the test run.

This seems to be nothing I can investigate during my workday though...

techyian commented 4 years ago

I've cloned your code and I could immediately reproduce the issues you're describing. For some reason the buffer header is returning EOS which then causes the async trigger to be fired. What I have tried doing is removing the MMALResizerComponent from your pipeline and instead used MMALCameraConfig.StillResolution = new Resolution(1024, 768);, this means the camera itself will output the desired resolution which from a performance point of view will be better anyway because we're not adding another component in to do the resizing. Since doing that, I've not had any failed messages in the console.

Please could you try that when you get some time and report back?

techyian commented 4 years ago

Update: I'm at 875 images. 0 failures so far. Will keep it running for another few hours and see if any issues arise.

Update 2: I'm now at 2246 images. Still 0 failures. I'm quite confident at this stage that the issue is likely the Resizer Component. As to why it's behaving like that I'm not entirely sure. Typically the Resizer Component would be coupled with the Splitter and attached to the camera's video port. It doesn't really make sense to be using it in this way when you can just tell the camera itself to output a specific resolution.

Spontifixus commented 4 years ago

That looks good! I changed my implementation as well, and started it a few minutes ago. I will give an update tomorrow.

Regarding the "root-cause": I used the resizer component as described in this article. I have two ideas here:

I have no experience (yet) with the video port - so if I update the example that might go wrong... Also I am not sure on where to present the hint on resizing images. Maybe at the top of the Resizer paragraph. Do you want me to add it?

techyian commented 4 years ago

Yes I agree that example is fairly useless considering you can just set the resolution against the camera directly. By all means if you want to add some guidance to that section that would be very helpful. I'll also put together an example which features both the Resizer and Splitter components to show how the two can compliment eachother.

Spontifixus commented 4 years ago

The app has now captured several thousand images without a flaw.

Except - I hate to extend this issue over and over again - capturing one image with a shutter speed of 2 seconds takes anything from 8 to 17 seconds right now. But I guess visitors of my site will simply have to cope with this. It is dark after all. And why would anyone have a look at bunnies in the night... ;)

techyian commented 4 years ago

Yes, not much I can do about that I'm afraid as it's the camera hardware which causes such a long wait time when the shutter speed is increased, you'll also see similar wait times with raspistill. I'm glad your code is working though and apologies for the confusion with the resizer component example in the wiki.

Spontifixus commented 4 years ago

Well - adding a hint to the wiki seems to be a complex thing.

I will leave that be for the moment a have look into this next week again. Thanks again for your patience and your help!

Spontifixus commented 4 years ago

Just to give you an update: I have no idea how I can propose a change to the Wiki. Can you help?

techyian commented 4 years ago

Hi Markus,

Sorry I have no idea how to do that, it's not something I've ever looked into. I thought there was usually a button which allowed you to make edits to wiki pages and do a pull request in? I don't mind making the changes required - I really appreciate making me aware of the bugs you found.

techyian commented 4 years ago

Just to update you on this I've made some changes to the Resizer example which should hopefully clarify how this component would typically be used. Please also note my comments around using a ramdisk mount for performance reasons. If there's anything that's still unclear please let me know but I think at this stage we're good to close this ticket off.