jomjol / AI-on-the-edge-device

Easy to use device for connecting "old" measuring units (water, power, gas, ...) to the digital world
https://jomjol.github.io/AI-on-the-edge-device-docs/
5.91k stars 646 forks source link

Feature: reduce sd card usage #73

Closed jomjol closed 3 years ago

jomjol commented 3 years ago

Reduction of sd card usage

One major topic is the high amount of sd card read / write access, mainly driven by the image processing with intermediate storage on sd-card. The branch https://github.com/jomjol/AI-on-the-edge-device/tree/rolling-reduce-sd-use tackels this problem by moving the whole image processing to the (PS)RAM. This implies major changes in the code. The basic version is functioning, but there are major memory leaks, which result in a high frequency of restarts. If anybody is interested in supporting me there - you are welcome!

This issue is for discussing and bringing this idea forward. Discussions in German and English are welcome!

hike6688 commented 3 years ago

Hi Jomol I did very extended logging to see what happens before I get a crash on my gasmeter-counter. I use the old rolling version from Sep 25 with a lot of logging extensions. You see, I have very much read/write access to the sd-card. I limit the logfile to 150 KB to reduce the time and load to read with the fileserver. Yesterday I got about 14 * 150 KB log file. I have added checking return values to avoid using null pointers (rgb_image) in ClassFlowAlignment::doFlow alignAndCut. For the days in December I got crashes only in ClassFlowAlignment::doFlow alignAndCut when loading /sdcard/img_tmp/rot.jpg with CImageBasis::CImageBasis . This happens after a lot of rounds (> 109) A free internal heap size lower than 26500 bytes seems to be critical, critical maybe also the value of the largest block.

2020-12-22_06-13-26: task_autodoFlow --------- next round nr: 110 2020-12-22_06-13-26: task_autodoFlow before flow last heap size: 26600 2020-12-22_06-13-26: task_autodoFlow before flow curr heap size: 26600 2020-12-22_06-13-26: task_autodoFlow before flow heap diff: 0 2020-12-22_06-13-26: task_autodoFlow before flow SPI Heap: 3207328 Internal Min Heap free: 11652 larg. Block: 14068 Heap: 26600 2020-12-22_06-13-26: FlowControll.doFlow - ClassFlowMakeImage 2020-12-22_06-13-26: FlowControll before doFlow :ClassFlowMakeImage SPI Heap: 3207328 Internal Min Heap free: 11652 larg. Block: 14068 Heap: 26328 2020-12-22_06-13-26: CCamera::CaptureToFile start 2020-12-22_06-13-31: CCamera::CaptureToFile try to save camera to: /sdcard/img_tmp/raw.jpg Filetype: JPG params: w 640, h 480, size 41600 waitflash 5000 2020-12-22_06-13-31: CCamera::CaptureToFile save camera to: /sdcard/img_tmp/raw.jpg done 2020-12-22_06-13-31: ClassFlowMakeImage::takePictureWithFlashraw picture saved 2020-12-22_06-13-31: ClassFlowMakeImage::doFlow takePictureWithFlash done 2020-12-22_06-13-31: FlowControll.doFlow - ClassFlowMakeImage done 2020-12-22_06-13-31: FlowControll doFlow SPI Heap: 3207328 Internal Min Heap free: 11652 larg. Block: 14068 Heap: 26380 2020-12-22_06-13-31: FlowControll after doFlow :ClassFlowMakeImage SPI Heap: 3207328 Internal Min Heap free: 11652 larg. Block: 14068 Heap: 26292 2020-12-22_06-13-31: FlowControll.doFlow - ClassFlowAlignment 2020-12-22_06-13-31: FlowControll before doFlow :ClassFlowAlignment SPI Heap: 3207328 Internal Min Heap free: 11652 larg. Block: 14068 Heap: 26328 2020-12-22_06-13-31: ClassFlowAlignment::doFlow begin 2020-12-22_06-13-31: ClassFlowAlignment::doFlow process filedate: 2020-12-22 06:13:30 /sdcard/img_tmp/raw.jpg 2020-12-22_06-13-33: ClassFlowAlignment::doFlow no rotate, copy /sdcard/img_tmp/raw.jpg -> /sdcard/img_tmp/rot.jpg done 2020-12-22_06-13-33: ClassFlowAlignment::doFlow alignAndCut begin 2020-12-22_06-13-33: Image Load failed:/sdcard/img_tmp/rot.jpg FreeHeapSize before: 3221856 after: 3221768 2020-12-22_06-13-33: Image Load failed:/sdcard/img_tmp/rot.jpg SPIFreeHeapSize before: 3207328 after: 3207328 diff: 0

or

2020-12-23_08-55-24: task_autodoFlow --------- next round nr: 135 2020-12-23_08-55-24: task_autodoFlow before flow last heap size: 26420 2020-12-23_08-55-24: task_autodoFlow before flow curr heap size: 26420 2020-12-23_08-55-24: task_autodoFlow before flow heap diff: 0 2020-12-23_08-55-24: task_autodoFlow before flow SPI Min free Heap: 731192 SPI Heap: 3207328 Internal Min Heap free: 14856 larg. Block: 13384 Heap: 26420 2020-12-23_08-55-24: ClassFlowControll::doFlow execute ClassFlowMakeImage 2020-12-23_08-55-24: ClassFlowControll::doFlow before doFlow: ClassFlowMakeImage SPI Min free Heap: 731192 SPI Heap: 3207328 Internal Min Heap free: 14856 larg. Block: 13600 Heap: 26008 2020-12-23_08-55-24: CCamera::CaptureToFile start 2020-12-23_08-55-30: CCamera::CaptureToFile try to save camera to: /sdcard/img_tmp/raw.jpg Filetype: JPG params: w 640, h 480, size 42400 waitflash 5000 2020-12-23_08-55-30: CCamera::CaptureToFile save camera to: /sdcard/img_tmp/raw.jpg done 2020-12-23_08-55-30: ClassFlowMakeImage::takePictureWithFlashraw picture saved 2020-12-23_08-55-30: ClassFlowMakeImage::doFlow takePictureWithFlash done 2020-12-23_08-55-30: ClassFlowControll::doFlow execute ClassFlowMakeImage done 2020-12-23_08-55-30: FlowControll doFlow SPI Min free Heap: 731192 SPI Heap: 3207328 Internal Min Heap free: 14856 larg. Block: 13600 Heap: 26144 2020-12-23_08-55-30: ClassFlowControll::doFlow after doFlow :ClassFlowMakeImage SPI Min free Heap: 731192 SPI Heap: 3207328 Internal Min Heap free: 14856 larg. Block: 13600 Heap: 25952 2020-12-23_08-55-30: ClassFlowControll::doFlow execute ClassFlowAlignment 2020-12-23_08-55-30: ClassFlowControll::doFlow before doFlow: ClassFlowAlignment SPI Min free Heap: 731192 SPI Heap: 3207328 Internal Min Heap free: 14856 larg. Block: 13600 Heap: 26008 2020-12-23_08-55-30: ClassFlowAlignment::doFlow begin 2020-12-23_08-55-30: ClassFlowAlignment::doFlow process filedate: 2020-12-23 08:55:30 /sdcard/img_tmp/raw.jpg 2020-12-23_08-55-31: ClassFlowAlignment::doFlow no initial rotate, copy /sdcard/img_tmp/raw.jpg -> /sdcard/img_tmp/rot.jpg done 2020-12-23_08-55-31: ClassFlowAlignment::doFlow alignAndCut begin with: /sdcard/img_tmp/rot.jpg 2020-12-23_08-55-31: CImageBasis::CImageBasis try to create from /sdcard/img_tmp/rot.jpg 2020-12-23_08-55-32: CImageBasis::CImageBasis Image Load failed:/sdcard/img_tmp/rot.jpg FreeHeapSize before: 3221552 after: 3221344 2020-12-23_08-55-32: CImageBasis::CImageBasis Image Load failed:/sdcard/img_tmp/rot.jpg SPIFreeHeapSize before: 3207328 after: 3207328 diff: 0 2020-12-23_08-55-32: 2020-12-23_08-55-32: ====== Main started =======

I now log the amount of spi-heap creating a new CImageBasis::CImageBasis is using

hike6688 commented 3 years ago

I have tried the rolling-reduce-sd-use branch. Internal heap seams to go down rapidly I will watch spi and internal heap for more detail after each round.

jomjol commented 3 years ago

@hike6688, yes I see the same problem. In the new version I changed a lot of the image handling. For the major part of the images I reserve a fixed memory right at the beginning and reuse it (mainly in PSRAM). Therefore I also see a rapid decrease at the beginning, but then stay rather constant. The problem is, that I have not enough memory to reserve all. The need RAM during one round is majorly:

  1. main picture with 640x480x3=922k
  2. one working image for intermediate calculation of rotation, translation, ... (image alignment)
  3. Images of the ROIs
  4. working memory for tflite - 200k
  5. loading of the tflite --> up to 1500k for large models

Especially the last one is very annoying, as tflite requires to first load the file completely into the memory to process it. Therefore I need to release minimum the memory for the working image. All other images I can keep.

First experience with the new version is: reboot after abouot 6-8 hours. Somewhere seems to be a memory leak, but I don't know where yet. Nevertheless with this you can reduce the access to the sd-card to only

Lets search for the leak and than it will be overall much more stable with respect to SD card problems.

Thanks for your support in advance. 2 Images and the loading of the tflite does n

hike6688 commented 3 years ago

I made a new branch for testing with your reference picture instead of taking a new picture on my test device. Therfore I hacked ClassFlowMakeImage::takePictureWithFlash and put an additional parameter Testdevice = true in the MakeImage section. I have not tested now if I can reduce the initial fd-buffer for the camera. I think it is set for UXGA in camera_config line 75 in ClassControllCamera.cpp Will check it tommorrow.

jomjol commented 3 years ago

I have recorded the "Min Free Heap" over night:

grafik

Background:

I would assume, that every flow looks the same. Up to now I have no idea, what is triggering the increase in memory demand?

For comparision, I have a test device running:

grafik

Main difference:

The reboots at the beginning is coming from playing around. The lower free memory in the stable state is due to the fact, that there most of the image memory is allocated static at the beginning.

hike6688 commented 3 years ago

Yesterday evening I created a new branch rolling-reduce... -on-testdevice. Up to now it has 162 rounds without reboot. There was no other web-server action.

2020-12-24_12-19-10: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 1203644 Internal Heap: 25172 larg. Blk free: 21144 Min Heap free: 9648

After using the fileserver:

2020-12-24_12-38-34: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 1203644 Internal Heap: 25172 larg. Blk free: 21144 Min Heap free: 7796

This is ok, since only Min Heap free: 7796 has changed.

In my old version I consequently logged all stages of the flow. So the last problem I found was the CImageBasis::CImageBasis Image Load failed , but I did not find out why stbi load failed.

I have now the main page running, calling handler_wasserzaehler regulary every 5 minutes. I assume this is done in the java script.

2020-12-24_12-59-26: handler_wasserzaehler 2020-12-24_12-59-26: CImageBasis::CImageBasis copyfrom start 2020-12-24_12-59-26: CImageBasis::CImageBasis copyfrom w= 640 h= 480 bpp= 3 channels= 3 2020-12-24_12-59-26: CImageBasis::CImageBasis copyfrom needs heap 921600 available: 1222104 2020-12-24_12-59-27: CImageBasis::CImageBasis copyfrom finished 2020-12-24_12-59-28: CImageBasis::~CImageBasis start 2020-12-24_12-59-28: CImageBasis::~CImageBasis before free spi free 1222104 internal free 17772 2020-12-24_12-59-28: CImageBasis::~CImageBasis after free spi free 2143704 diff 921600 internal free 17772 diff 0 2020-12-24_12-59-28: CImageBasis::~CImageBasis done

some minutes later I have the internal heap going down

2020-12-24_13-17-25: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 1203644 Internal Heap: 21848 larg. Blk free: 19428 Min Heap free: 4240

2020-12-24_13-22-14: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 1203644 Internal Heap: 21800 larg. Blk free: 19236 Min Heap free: 4240

2020-12-24_13-27-05: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 1203644 Internal Heap: 21344 larg. Blk free: 18032 Min Heap free: 4240

I assume that I will get a crash if handler_wasserzaehler is called at a time when a flow is running. Waiting for that.

How did you get the data for the nice tables?

jomjol commented 3 years ago

The tables are handmade with Excel and some automated text cutting to get the value of the Min Heap out of the Logfile.

hike6688 commented 3 years ago

Yesterday evening I forgot to push my changes to rolling-reduce-sd-on-testdevice, done now I now look on changes on SPI-Heap and internal heap. Now the Log files increases , ( now about 800 Kb for today ) , not shure some of the low internal heap comes from that.

hike6688 commented 3 years ago

@jomjol please have look on the extened logging protocoll from 24 and 25 Dec.

start is:

2020-12-24_21-42-37: ============================== Main Started ======================================= 2020-12-24_21-42-41: Time zone set to CET-1CEST,M3.5.0,M10.5.0/3 2020-12-24_21-42-41: Server_tflite::task_autodoFlow Spi at start 3207308 after init 2143712 diff 1063596 2020-12-24_21-42-41: task_autodoFlow - next round 1 2020-12-24_21-42-41: ClassFlowControll::doFlow - Start: SPI Heap: 2143712 larg. Blk free: 2143712 Min SPI free: 2143712 Internal Heap: 27328 larg. Blk free: 25584 Min Heap free: 23564

min free heap is fast decreasing with main web page calling handler_wasserzaehler within flow.

The first call to handler_wasserzaehler with min heap 1260 was at 2020-12-24_22-12-26 within round 7.

2020-12-24_22-12-32: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 21140 larg. Blk free: 17464 Min Heap free: 1260 2020-12-24_22-12-32: CPU Temperature: 52.2 2020-12-24_22-12-32: Server_tflite::task_autodoFlow Spi at end 2143704 after init 2143704 diff 0 diff to task start 1063604 2020-12-24_22-12-33: task_autodoFlow - round done 7

Now all calls to handler_wasserzaehler happens outside a flow

Then Internal Min Heap free: 1260 is stable until now

the last call to handler_wasserzaehler was at 2020-12-24_23-47-26

2020-12-24_23-47-26: handler_wasserzaehler start 2020-12-24_23-47-26: handler_wasserzaehler at start SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 5724 larg. Blk free: 1780 Min Heap free: 1260 2020-12-24_23-47-27: handler_wasserzaehler at end SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 4160 larg. Blk free: 1780 Min Heap free: 1260 2020-12-24_23-47-27: handler_wasserzaehler done 2020-12-24_23-47-27: handler_wasserzaehler start 2020-12-24_23-47-27: handler_wasserzaehler at start SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 9476 larg. Blk free: 7004 Min Heap free: 1260 2020-12-24_23-47-27: handler_wasserzaehler at end SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 7908 larg. Blk free: 5596 Min Heap free: 1260 2020-12-24_23-47-27: handler_wasserzaehler done 2020-12-24_23-48-48: task_autodoFlow - next round 27

left Internal Min Heap free: 1260 up to now

2020-12-25_11-08-38: ClassFlowControll::doFlow - finished : SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 14456 larg. Blk free: 10228 Min Heap free: 1240 2020-12-25_11-08-38: CPU Temperature: 53.3 2020-12-25_11-08-38: Server_tflite::task_autodoFlow Spi at end 2143704 after init 2143704 diff 0 diff to task start 1063604 2020-12-25_11-08-39: task_autodoFlow - round done 167

log_2020-12-25.txt log_2020-12-24.txt

I am now in round 175, calls to /fileserver only reduced the internal min heap to 1240

2020-12-25_11-48-49: download_get_handlerat start SPI Heap: 2143704 larg. Blk free: 1222104 Min SPI free: 295452 Internal Heap: 14128 larg. Blk free: 10228 Min Heap free: 1240

regards hike

jomjol commented 3 years ago

I have just uploaded an update, that has an improved and extended logging:

The function looks like this: LogFile.WriteHeapInfo("CONTEXT"); This will create a line in the Logfile, with data separated by tabulator, making it easier to analyse later on. It is running now and I'm waiting for some reboots to happen.

hike6688 commented 3 years ago

Yesterday I changed in camera_config .frame_size = FRAMESIZE_VGA,

2020-12-25_23-43-47: Server_tflite::task_autodoFlow - next round 1 2020-12-25_23-43-47: ClassFlowControll::doFlow - Start: SPI Heap: 2950112 larg. Blk free: 2950112 Min SPI free: 2950112 Internal Heap: 57692 larg. Blk free: 40048 Min Heap free: 54352

Camera buffer is now 150K instead of 900K on Spi-Flash. Internal Heap starts with 57692 B instead of 27176 b Testsystem run up to now with 240 rounds

Logfile opened: /sdcard/log/message/log_2020-12-26.txt content: Server_tflite::task_autodoFlow Spi at end 2950104 at start round 2950104 diff 0 diff to task start 1063604 Logfile opened: /sdcard/log/message/log_2020-12-26.txt content: Server_tflite::task_autodoFlow internal at end 56696 at start round 56728 diff 32 diff to task start 4636

Internal Heap is slowly decreasing with 32 B each round.

My version now occupies about 98,7 % of flash. Are there any parts of unused code I can switch off from compiling?

jomjol commented 3 years ago

Good idea, I will limit the resolution to VGA - you do not need more. Regarding space, there are some functions in the camera server that are not really needed. I will have a look for this. I'm at about 97.1% - including the extended logging. What else did you change?

hike6688 commented 3 years ago
  1. Lot of changes to watch internal and spi heap at start and end of a function , added a name support to CImageBasis to see what is created and delted in the log. see rolling-reduce-sd-use-on-testdevice-objname.
  2. Watch the flow status in sysinfo .

First results: CImageBasis does not leave holes on spi , but it seems that setting a name to a CImagebasis object in a method that is declared virtual does not work.

Calling img_tmp_virtual_handler with uri /img_tmp/alg_roi.jpg seems to eat 448 from internal heap each call. This value is reproducable with repaeted calls outside a running flow.

2020-12-26_14-46-28: server_main::img_tmp_virtual_handler start 2020-12-26_14-46-28: server_main::img_tmp_virtual_handler uri: /img_tmp/alg_roi.jpg 2020-12-26_14-46-29: server_main::img_tmp_virtual_handler SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 53904 larg. Blk free: 40048 Min Heap free: 38648 2020-12-26_14-46-29: ClassFlowControll::GetJPGStream start 2020-12-26_14-46-29: ClassFlowControll::GetJPGStream at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 53792 larg. Blk free: 40048 Min Heap free: 38648 2020-12-26_14-46-29: CImageBasis::CImageBasis copyfrom start 2020-12-26_14-46-29: CImageBasis::CImageBasis copyfrom w= 640 h= 480 bpp= 3 channels= 3 2020-12-26_14-46-29: CImageBasis::CImageBasis copyfrom needs heap 921600 available: 2028504 2020-12-26_14-46-29: CImageBasis::CImageBasis copyfrom finished 2020-12-26_14-46-29: CImageBasis::setObjectName ClassFlowControll::GetJPGStream _imgzw 2020-12-26_14-46-31: ClassFlowControll::GetJPGStream before delete _send SPI Heap: 2028504 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 50060 larg. Blk free: 36960 Min Heap free: 38648 2020-12-26_14-46-31: CImageBasis::~CImageBasis start delete ClassFlowControll::GetJPGStream _imgzw 2020-12-26_14-46-31: CImageBasis::~CImageBasis before free spi free 2028504 internal free 53248 2020-12-26_14-46-31: CImageBasis::~CImageBasis after free spi free 2950104 diff 921600 internal free 53248 diff 0 2020-12-26_14-46-31: CImageBasis::~CImageBasis delete ClassFlowControll::GetJPGStream _imgzw done 2020-12-26_14-46-31: ClassFlowControll::GetJPGStream before delete _send SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 53324 larg. Blk free: 40048 Min Heap free: 38648 2020-12-26_14-46-31: ClassFlowControll::GetJPGStream done 2020-12-26_14-46-31: server_main::img_tmp_virtual_handler SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 53456 larg. Blk free: 40048 Min Heap free: 38648 2020-12-26_14-46-31: server_main::img_tmp_virtual_handler int. heap diff 448 2020-12-26_14-46-31: server_main::img_tmp_virtual_handler done GetJPG alg_roi.jpg

calling main_handler with uri "/" seems to eat 2408 B from internal , but sometimes its more or less

2020-12-26_15-12-17: hello_main_handler uri: / 2020-12-26_15-12-17: hello_main_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 54164 larg. Blk free: 40048 Min Heap free: 38604 2020-12-26_15-12-17: hello_main_handler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 52552 larg. Blk free: 35416 Min Heap free: 38604 2020-12-26_15-12-17: hello_main_handler done 2020-12-26_15-12-17: hello_main_handler uri: /wasserzaehler_roi.html 2020-12-26_15-12-17: hello_main_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 54164 larg. Blk free: 40048 Min Heap free: 38604 2020-12-26_15-12-17: hello_main_handler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47860 larg. Blk free: 35416 Min Heap free: 38604 2020-12-26_15-12-17: hello_main_handler done 2020-12-26_15-12-17: hello_main_handler uri: /jquery-3.5.1.min.js 2020-12-26_15-12-17: hello_main_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 51024 larg. Blk free: 36960 Min Heap free: 38604 2020-12-26_15-12-17: hello_main_handler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 50984 larg. Blk free: 36960 Min Heap free: 38604 2020-12-26_15-12-17: hello_main_handler done 2020-12-26_15-13-21: hello_main_handler uri: / 2020-12-26_15-13-21: hello_main_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 54164 larg. Blk free: 40048 Min Heap free: 38604 2020-12-26_15-13-21: hello_main_handler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47860 larg. Blk free: 35416 Min Heap free: 38604 2020-12-26_15-13-21: hello_main_handler done 2020-12-26_15-13-21: hello_main_handler uri: /wasserzaehler_roi.html 2020-12-26_15-13-21: hello_main_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 51028 larg. Blk free: 36960 Min Heap free: 38604 2020-12-26_15-13-21: hello_main_handler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47860 larg. Blk free: 35416 Min Heap free: 38604 2020-12-26_15-13-21: hello_main_handler done 2020-12-26_15-13-21: hello_main_handler uri: /jquery-3.5.1.min.js 2020-12-26_15-13-21: hello_main_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 50544 larg. Blk free: 36960 Min Heap free: 38604 2020-12-26_15-13-21: hello_main_handler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 52072 larg. Blk free: 38504 Min Heap free: 38604 2020-12-26_15-13-21: hello_main_handler done 2020-12-26_15-13-26: server_main::img_tmp_virtual_handler start 2020-12-26_15-13-26: server_main::img_tmp_virtual_handler uri: /img_tmp/alg_roi.jpg 2020-12-26_15-13-26: server_main::img_tmp_virtual_handler SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 50008 larg. Blk free: 36736 Min Heap free: 38536 2020-12-26_15-13-26: ClassFlowControll::GetJPGStream start 2020-12-26_15-13-26: ClassFlowControll::GetJPGStream at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 49896 larg. Blk free: 36736 Min Heap free: 38536 2020-12-26_15-13-26: CImageBasis::CImageBasis copyfrom start 2020-12-26_15-13-26: CImageBasis::CImageBasis copyfrom w= 640 h= 480 bpp= 3 channels= 3 2020-12-26_15-13-26: CImageBasis::CImageBasis copyfrom needs heap 921600 available: 2028504 2020-12-26_15-13-26: CImageBasis::CImageBasis copyfrom finished 2020-12-26_15-13-26: CImageBasis::setObjectName ClassFlowControll::GetJPGStream _imgzw 2020-12-26_15-13-28: ClassFlowControll::GetJPGStream before delete _send SPI Heap: 2028504 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 46172 larg. Blk free: 32104 Min Heap free: 38536 2020-12-26_15-13-28: CImageBasis::~CImageBasis start delete ClassFlowControll::GetJPGStream _imgzw 2020-12-26_15-13-28: CImageBasis::~CImageBasis before free spi free 2028504 internal free 47784 2020-12-26_15-13-28: CImageBasis::~CImageBasis after free spi free 2950104 diff 921600 internal free 47784 diff 0 2020-12-26_15-13-28: CImageBasis::~CImageBasis delete ClassFlowControll::GetJPGStream _imgzw done 2020-12-26_15-13-28: ClassFlowControll::GetJPGStream before delete _send SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47864 larg. Blk free: 35192 Min Heap free: 38536 2020-12-26_15-13-28: ClassFlowControll::GetJPGStream done 2020-12-26_15-13-28: server_main::img_tmp_virtual_handler SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 49560 larg. Blk free: 36736 Min Heap free: 38536 2020-12-26_15-13-28: server_main::img_tmp_virtual_handler ----- int. heap diff 448 before: 50008 after: 49560 2020-12-26_15-13-28: server_main::img_tmp_virtual_handler done GetJPG alg_roi.jpg 2020-12-26_15-13-28: handler_wasserzaehler start 2020-12-26_15-13-28: handler_wasserzaehler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 50160 larg. Blk free: 36736 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler handler_wasserzaehler uri: /wasserzaehler.html?rawvalue=true 2020-12-26_15-13-28: handler_wasserzaehler before getReadOut SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 50160 larg. Blk free: 36736 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler after getReadOut SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47028 larg. Blk free: 33648 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47028 larg. Blk free: 33648 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler done 2020-12-26_15-13-28: handler_wasserzaehler start 2020-12-26_15-13-28: handler_wasserzaehler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 49128 larg. Blk free: 33648 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler handler_wasserzaehler uri: /wasserzaehler.html 2020-12-26_15-13-28: handler_wasserzaehler before getReadOut SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 49128 larg. Blk free: 33648 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler after getReadOut SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47556 larg. Blk free: 32104 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47520 larg. Blk free: 32104 Min Heap free: 38536 2020-12-26_15-13-28: handler_wasserzaehler done 2020-12-26_15-13-28: handler_prevalue start 2020-12-26_15-13-28: handler_prevalue at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 51248 larg. Blk free: 36736 Min Heap free: 38536 2020-12-26_15-13-28: handler_prevalue handler_wasserzaehler uri: /setPreValue.html 2020-12-26_15-13-29: handler_prevalue at end SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 47992 larg. Blk free: 35192 Min Heap free: 38536 2020-12-26_15-13-29: handler_prevalue done 2020-12-26_15-13-36: download_get_handler start 2020-12-26_15-13-36: download_get_handler at start SPI Heap: 2950104 larg. Blk free: 2028504 Min SPI free: 2010044 Internal Heap: 51756 larg. Blk free: 37320 Min Heap free: 38536 2020-12-26_15-13-36: download_get_handler uri: /fileserver/log/message/log_2020-12-26.txt

diff 54164 - 51756 = 2408

jomjol commented 3 years ago

I limited the camera to VGA and the free heap increased a lot. Never the less I still have frequent reboots in the part of taking images:

2020-12-27_01-43-23:    FlowControll.doFlow - ClassFlowMQTT     Heap:   2994996 Min Free:   2040476 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   44884   Internal Heap:  44884   Internal Min Heap free: 30428
2020-12-27_01-43-23: sent publish successful in MQTTPublish, msg_id=56477, wasserzaehler/zaehlerstand, 49.1391
2020-12-27_01-43-23: sent publish successful in MQTTPublish, msg_id=62008, wasserzaehler/error, 
2020-12-27_01-43-23:    ClassFlowAnalog::doFlow     Heap:   2995056 Min Free:   2040476 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   44944   Internal Heap:  44944   Internal Min Heap free: 30428
2020-12-27_01-43-23: task_autodoFlow - round done
2020-12-27_01-43-23: CPU Temperature: 59.4
2020-12-27_01-47-26: task_autodoFlow - next round - Round #6
2020-12-27_01-47-26:    ClassFlowAnalog::doFlow - Start     Heap:   2995044 Min Free:   2040476 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   44932   Internal Heap:  44932   Internal Min Heap free: 30428
2020-12-27_01-47-26:    FlowControll.doFlow - ClassFlowMakeImage        Heap:   2994932 Min Free:   2040476 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   44820   Internal Heap:  44820   Internal Min Heap free: 30428
2020-12-27_01-47-44: Time zone set to CET-1CEST,M3.5.0,M10.5.0/3
2020-12-27_01-47-44: =============================================================================================
2020-12-27_01-47-44: =================================== Main Started ============================================
2020-12-27_01-47-44: =============================================================================================
2020-12-27_01-47-48: Time zone set to CET-1CEST,M3.5.0,M10.5.0/3
2020-12-27_01-47-48: task_autodoFlow - next round - Round #1
2020-12-27_01-47-48:    ClassFlowAnalog::doFlow - Start     Heap:   2995640 Min Free:   2992004 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   45528   Internal Heap:  45528   Internal Min Heap free: 41892
2020-12-27_01-47-48:    FlowControll.doFlow - ClassFlowMakeImage        Heap:   2993956 Min Free:   2991940 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   43844   Internal Heap:  43844   Internal Min Heap free: 41828
2020-12-27_01-48-07: Time zone set to CET-1CEST,M3.5.0,M10.5.0/3
2020-12-27_01-48-07: =============================================================================================
2020-12-27_01-48-07: =================================== Main Started ============================================
2020-12-27_01-48-07: =============================================================================================
2020-12-27_01-48-10: Time zone set to CET-1CEST,M3.5.0,M10.5.0/3
2020-12-27_01-48-11: task_autodoFlow - next round - Round #1
2020-12-27_01-48-11:    `ClassFlowAnalog::doFlow - Start        Heap:   2995656 Min Free:   2992532 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   45544   Internal Heap:  45544   Internal Min Heap free: 42420
2020-12-27_01-48-11:    FlowControll.doFlow - ClassFlowMakeImage        Heap:   2993984 Min Free:   2992236 larg. Block:    2950112 SPI Heap:   2950112 NOT_SPI Heap:   43872   Internal Heap:  43872   Internal Min Heap free: 42124

I need to figure out, what is different to your implementation or I have a defect on the camera.

jomjol commented 3 years ago

@hike6688: what exact version are you running now in that stable mode? I want to compare the code to find the root cause for my reboots.

hike6688 commented 3 years ago

Have a look on CCamera. You use esp_camera_fb_return in CCamera::CaptureToFile and CCamera::CaptureToHTTP but not in CCamera::CaptureToBasisImage. Maybe you have to watch the result of frame2bmp(fb, &buf, &buf_len) or check what exactly fb delivers.

I have changed getESPHeapInfo() to get more info about internal and spi heap and I log more calls to the functions in use see ClassFlowMakeImage::doFlow(string zwtime), but I cannot take a new picture as only the production device is on my gascounter.

The results of recognition are rather bad with the old software, so I run a pulsecounter on an ESP8266 in parallel, that counts the pulses from a reed contact. But this results differ a little bit from real values , so have I take a picture every hour to see the differences. I implemented this (esp_err_t handler_capture_with_light_and_save_to_file(httpd_req_t *req)) in the last version of the test-branch.

currently is test with the branch called rolling-reduce..objname,

this is the output from last night.

log_2020_12_26_21_53.txt log_2020_12_27_13_55.txt

Protocolls are from my testdevice without taking a new picture every round, just reading the reference picture. See ClassFlowMakeImage::takePictureWithFlash I do not use mqtt. ` void ClassFlowMakeImage::takePictureWithFlash(int flashdauer) { std::string aMsgHead = "ClassFlowMakeImage::takePictureWithFlash "; LogFile.WriteToFile(aMsgHead + "start"); char buf[80]; if (isTestDevice){ long spiHeapBeforeDelete = getSPIHeapSize(); //delete rawImage ; we can not delete rawImage, cause it is referenced in ClassFlowAlignment //long spiHeapAfterDelete = getSPIHeapSize(); //rawImage = new CImageBasis("/sdcard/config/reference.jpg"); // we try it in this way sprintf (buf,"rawImage: w= %d h= %d, bpp= %d channels= %d",rawImage->width,rawImage->height,rawImage->bpp, rawImage->channels); LogFile.WriteToFile(aMsgHead + string(buf)); free(rawImage->rgb_image); long spiHeapAfterFree = getSPIHeapSize(); int width, height , bpp; int channels = 3; rawImage->rgb_image = stbi_load("/sdcard/config/reference.jpg", &width, &height, &bpp, channels); sprintf (buf,"stbi_load: w= %d h= %d, bpp= %d ",width,height,bpp); LogFile.WriteToFile(aMsgHead + string(buf)); long spiHeapAfterLoad = getSPIHeapSize(); sprintf (buf,"spi Heap before delete %ld after delete %ld after create %ld ",spiHeapBeforeDelete, spiHeapAfterFree, spiHeapAfterLoad ); LogFile.WriteToFile(aMsgHead + string(buf));

} else {
    Camera.CaptureToBasisImage(rawImage, flashdauer);
    if (SaveAllFiles) rawImage->SaveToFile(namerawimage);
}
LogFile.WriteToFile(aMsgHead + "finished");

} ` and from config.ini with small change to read TESTDEVICE property.

[MakeImage] ;LogImageLocation = /log/source ;LogfileRetentionInDays = 15 WaitBeforeTakingPicture = 5 ImageQuality = 5 ImageSize = VGA Testdevice = true .... [MQTT] ;Uri = mqtt://IP-ADRESS:1883 Topic = wasserzaehler/zaehlerstand TopicError = wasserzaehler/error ClientID = wasser ;user = USERNAME ;password = PASSWORD

jomjol commented 3 years ago

I could trace it down to the point in the Camera picture taking to the code frame2bmp(...) This is not delivering a bmp as result in an not assigned pointr (NULL). It is not a question of memory, there is exactly the same amount available as in a good run before. Maybe I'm hunting a technical defect of my camera?

I tend to publish this code in the rolling. What is your opinion?

hike6688 commented 3 years ago

What about the missing? esp_camera_fb_return in CCamera::CaptureToBasisImage? If you have more that one round, the camera has been working before. Do you have this since you reduced fb to VGA?

jomjol commented 3 years ago

Yes, I have added this - but it does not help :-(. I'll now made a loop to retry, if frame2bmp fails for up to 5 times. Let's see what the logs tell me. Problem is, I cannot produce this error on my test device so far, so debugging is very slow and not making fun.

hike6688 commented 3 years ago

Somewhere was a hint that sometimes the camera may have a problem with VCC and the spikes from the ESP. Mit testdevice runs behind a very potent regulated labor device with 3 Amps and I see that taking a picture increases from 80 mA to 160 mA . Do you have a long cable to your device?

jomjol commented 3 years ago

This is a good hint - thanks! Indeed I do have a long cable, and with this I power two ESP devices! Maybe if both have a common power peak it fails. I have now implemented a loop, that waits for 3 seconds and than even gathers the fb again freshly. It helped now for two times, lets see if this is confirmed over night. If that is the case, I think we have finished the next rolling version.

jomjol commented 3 years ago

I have it now running on my testsystem, the only difference is, that there is no MQTT running and the picture is nonsense. Never the less, no restart at all. This is powered directly from my USB-port of my computer. Question is, why do I have it now only on the new system with reduced SD-card access. One explanation could be, that now at that point of failure, a transformation from jpg to bmp is done with the ESP32-cam library (frame2bmp), parallel to the image taking and the LED-Light still on. Before, only the jpg was written to SD-card and later on, after the camera and the LED was switched of the transformation to BMP was done. This in combination with a weak power supply could cause the problem.

jomjol commented 3 years ago

Problem still valid on the productive device, but on my test device error does not occure. So I will prepare to make this the new rolling. Any further experience from your device or is it stable?

jomjol commented 3 years ago

@hike6688 Commited to rolling

MMeinhardt1 commented 3 years ago

@jomjol I have seen you pushed a new rolling version. also you have updated the bootloader.bin is it nessecary to update this or is the firmware.bin enough.

hike6688 commented 3 years ago

I run a test for some hours on another camera watching my watermeter. Camera was taking pictures every round,. Did about 255 rounds without reboot.

SPI and internal heap seems to be stable if no access from the webserver is done. I add an complete and a commented log.

full_protocoll_2020_12_28_18_00_water_k.txt commented_protocoll_2020_12_28_18_00_water_k.txt

jomjol commented 3 years ago

@hike6688 firmware is sufficient. The new version is the one with the reduced SD-card writting. I still have the reboot problem on the active device, but not on my test device. I can only exchange them next week. So hopefully it is a problem of my power supply, which would be reasonable.

hike6688 commented 3 years ago

@JomJol I am testing the new rolling version on my watermeter with a small enhancement to getESPInfo() this night.

jomjol commented 3 years ago

@hike6688 How is the testing?

hike6688 commented 3 years ago

Seems to be stable for min 117 rounds

Protocoll_2020_12_30_10_42.txt

I have problems to use the configuration page if a flow is running, Randomless the values I set are updated or not. This does not happen if the flow waits for the next round. I have added information to sysinfo to see when the next flow will execute. Its also usefull for a ota update, should also not run in parallell with a flow.

Code is on rolling_extended_info

I did more testing and found a small issue in the recognition display page. The analog pictures are not shown.

It's caused by missing code in ClassFlowAnalog::GetHTMLInfo()

std::vector<HTMLInfo> ClassFlowAnalog::GetHTMLInfo() { std::vector<HTMLInfo> result;

for (int i = 0; i < ROI.size(); ++i)
{
    HTMLInfo *zw = new HTMLInfo;
    /*
    zw->filename = ROI[i]->name + ".jpg";
    zw->val = ROI[i]->result;
    result.push_back(zw);
    */
    // see ClassFlowDigit::GetHTMLInfo
    zw->filename = ROI[i]->name + ".bmp";
    zw->filename_org = ROI[i]->name + ".jpg";
    zw->val = ROI[i]->result;
    zw->image = ROI[i]->image;
    zw->image_org = ROI[i]->image_org;
    result.push_back(zw);
}

return result;

}

Now it works

Do not care for the value of Analog3 . My watermeter has a mirror on this gauge and produces wrong values

analog3

jomjol commented 3 years ago

@hike6688 EUREKA - I found the reason and the solution. Now I also have a stable running system:

2020-12-31_08-15-56: task_autodoFlow - round done
2020-12-31_08-15-56: CPU Temperature: 57.2
2020-12-31_08-19-55: task_autodoFlow - next round - Round #120

Memory Usage over night (120 rounds): grafik

Solution: Reduction of the camera working speed:

// XCLK 20MHz or 10MHz for OV2640 double FPS (Experimental)
// .xclk_freq_hz = 20000000,             // Orginalwert
.xclk_freq_hz = 5000000,               // Test, um die Bildfehler los zu werden !!!!

Found a hint in the internet: the transfer of the camera picture to the PSRAM is limited by the PSRAM speed. If too much data needs to be transfered, this fails and the algorithm to convert it into internal BMP fails, as the data is corrupted. This also explains, why it fails only on some systems. The image is stored as jpg, so it has varying size, depending on the camera picture details. My active device seems to be at the limit, sometimes its okay, sometimes not. My testing device is laying on a table delivering only unsharp and therefore small jpg images. Your device could also deliver smaller jpg sizes, just not exeeding the memory transfer limit. I will update the code during the day - switching off the debugging ... and the upload a new version.

hike6688 commented 3 years ago

@JomJol Congratulation!!!!

This may explain why my watermeter runs stable. I was not able to get the camera picture really sharp and reduce the reflections. raw_20201231-125344

2020-12-31_12-44-44: Server_tflite::task_autodoFlow Spi at end 3013792 at start round 4013708 diff 999916 diff to task start 999916 2020-12-31_12-44-44: Server_tflite::task_autodoFlow internal at end 46048 at start round 61612 diff 15564 diff to task start 15564 2020-12-31_12-44-44: Server_tflite::task_autodoFlow - round done 258

The internal heap value 61612 at start round is stable within all 258 rounds.

aendes commented 3 years ago

@jomjol , @hike6688 : My Image is the same quality like the image above. I had no issues in the last 24h with the current rolling release.

Create job;-) Happy new year

jomjol commented 3 years ago

@aendes: Thx for the feedback. Since the rolling from yesterday evening this problem is gone. Mine is now also running stable for more than 200 cycles. I will publish another rolling soon, dealing with internal improvements to avoid parallel access to images. But should not have any impact.

jomjol commented 3 years ago

Topic is done with the newest rolling!