Closed lterfloth closed 1 year ago
Sounds like a similar problem to the one I am having. https://github.com/jomjol/AI-on-the-edge-device/discussions/2184
There is somebody also who has performance issues on a rev3 hardware, see https://github.com/jomjol/AI-on-the-edge-device/discussions/2301
Maybe this is related...
I improved stability, at least temporarly. It took me over 4 hours to configure everything, as the interface was stupidly slow and crashed regularly. Found the option to push the clock frequency to 240 which helped tremendously with performance ... until I rebooted once. Had almost 50% packetloss before and latency was always >250ms with spikes up to 10s. After switching clock frequency it ran at around 150ms on average with spikes going to 400ms. Now, after reboot, it got subjectively worse again. Sadly I do not have any more time to check today.
Unfortunately, don't have any further technical information - I bought 4 of them, and it seems like all 4 have the same issues. This is also not limited to HTTP interface, but also MQTT, JSON API, and just in general awful performance.
A few things I've noticed:
The chip is v3 revision, from logs:
[MAIN] PSRAM size: 8388608 byte (8MB / 64MBit)
[MAIN] Total heap: 4175502 byte
[MAIN] Camera info: PID: 0x26, VER: 0x42, MIDL: 0x7f, MIDH: 0xa2
[MAIN] Device info: CPU cores: 2, Chip revision: 3
[MAIN] SD card info: Name: , Capacity: 3840MB, Free: 3826MB
The ESP has been purchased from https://www.berrybase.de/esp32-cam-development-board-inkl.-ov2640-kameramodul
Unfortunately, don't have any further technical information - I bought 4 of them, and it seems like all 4 have the same issues. This is also not limited to HTTP interface, but also MQTT, JSON API, and just in general awful performance.
A few things I've noticed: [...]
That covers basically everything I experienced yesterday afternoon/night. I tested two different manufacturers too, one had 8mb of PSRAM and one “just” 4. No difference whatsoever. Blowing (yeah, sounds weird) on the chip to cool it down did not improve packet loss or ping times (at least not within 30 seconds, I guess no temperature problems). My guess would be that it has to do with Chip rev 3, and it seems not to be able to cope with the computational demands AI on the edge has. Which is odd, shouldn't a rev3 always perform at least similar or better than other versions?
As I was able to finish configuration (took hours, literally), my unit ran throughout the night, and I connected it to home assistant. Over the course of 8 hours and an update interval of 10 minutes (I doubled the default for less strain on the unit), only about 10 updates were received but also 20 warnings that the unit was not available. So, probably packet loss, again. Anyway, I hope that some dev can dig into this. I'd donate a couple of bucks if that could speed up finding the bug (hopefully it is a bug and not a rev 3 limitation :-( )
Thank you both for the analysis you did.
One thing we could try to do is to update the Expressif platform (IDF). We still are on an older one. maybe they had some silicon bugs in rev3 which require a newer IDF.
2 months ago i started to migrate for this but it was not as easy as thought, some functions are no longer available.
Nevertheless, you you give https://github.com/jomjol/AI-on-the-edge-device/actions/runs/4152463509 a try?
It will not run stable, the time in the log will always be 0
, also external LEDs will not work. But it should be sufficient to see if the web UI performs better.
Make sure to do a backup beforehand!
Will try that out as soon as I'm back home. The second unit is not used for anything currently, therefore testing is nbd. Thanks for the hint!
Maybe this doc gives some insights, too? Especially "Impact on Customer Projects". Just a long shot, though: https://www.espressif.com/sites/default/files/documentation/ESP32_ECO_V3_User_Guide__EN.pdf
According to that document, not much has changed. Also, the first version of that document was in January 2020, so over 3 years ago. it is unlikely that you two are the first who get a rev3 chip using AIOTED!
Will try that out as soon as I'm back home. The second unit is not used for anything currently, therefore testing is nbd. Thanks for the hint!
Maybe this doc gives some insights, too? Especially "Impact on Customer Projects". Just a long shot, though: https://www.espressif.com/sites/default/files/documentation/ESP32_ECO_V3_User_Guide__EN.pdf
I got to try it out. WebUI seems to be responding quite well for the OTA-Upload part (not sure how that is called). Uploading the remote.zip, which did not work before whatsoever! Ping times were below 10ms for some pings, never seen that before. The initial setup is way more responsive, too. NO packetloss whatsoever. Ping times go up a little, but stay around 50-250 sitting 1m from the router. When interacitng with the setup dialogue, it does load every here and there. Never had a packet drop though!
216 packets transmitted, 216 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 5.029/155.057/884.124/138.759 ms
My initital feeling: updating the IDF/migrating to Platformio 6.x.x fixes this issue!
Thank you both for the analysis you did.
One thing we could try to do is to update the Expressif platform (IDF). We still are on an older one. maybe they had some silicon bugs in rev3 which require a newer IDF.
2 months ago i started to migrate for this but it was not as easy as thought, some functions are no longer available. Nevertheless, you you give https://github.com/jomjol/AI-on-the-edge-device/actions/runs/4152463509 a try? It will not run stable, the time in the log will always be
0
, also external LEDs will not work. But it should be sufficient to see if the web UI performs better.Make sure to do a backup beforehand!
I tried this on my rev3 and it first appeared to be performing better, but no. Just as slow as with the stable version.
@penapena did you do a fresh install and completely wipe the sd card as well?
@penapena did you do a fresh install and completely wipe the sd card as well?
Did erase_flash before install. Didn't format the sdcard, but deleted old files and pasted new ones.
The SD-Card content should not matter for the performance.
I built now a version (based on rolling) which specifically only supports rev3. I can't test it myself, so make sure you have USB access to revert if it does not work: https://github.com/jomjol/AI-on-the-edge-device/actions?query=branch%3Aset-min-version-to-rev3
As for the framework update: Since that at least for one of you might help, I want to follow this path one step deeper. I am trying to take the latest rolling and update the framework to 6.1.0
. (The above link still used 6.0.1
and an older rolling). It builds now but for some reason fails to init PSRAM. I will let you know once I have a better version there.
I would like to join this conversation as I face the same issue. The webUI is that slow that I cannot use it at all. Not sure what is causing the problem but I will follow up this thread and check if I can help you in any way.
Later today, I will try another device and see if the update performs better on that one, too. Just to make sure that I did not by chance pick, out of the four rev 3's I own, one that is somehow functional.
I completely fresh install the watermeter with https://github.com/jomjol/AI-on-the-edge-device/actions?query=branch%3Aset-min-version-to-rev3 and I can tell that it doesn't make any changes for me. The inital setup page takes about 1.3 mins to load.
I also installed the version (https://github.com/jomjol/AI-on-the-edge-device/actions/runs/4152463509) yesterday and the ping response times were dramatically reduced. My times correspond to those of @lterfloth.
The image build times of the GUI have not really changed but I could at least go through completely.
I (722) quad_psram: This chip is ESP32-D0WD
I (722) esp_psram: Found 8MB PSRAM device
I (722) esp_psram: Speed: 40MHz
I (722) esp_psram: PSRAM initialized, cache is in low/high (2-core) mode.
I (1638) cpu_start: cpu freq: 160000000 Hz
I (1639) cpu_start: Application information:
I (1639) cpu_start: Project name: AI-on-the-edge
I (1644) cpu_start: App version: 94371ba
I (1648) cpu_start: Compile time: Feb 11 2023 17:45:11
I (1653) cpu_start: ELF file SHA256: 0cabbb7bef32fe8c...
I (1658) cpu_start: ESP-IDF: 5.0.0
I (3778) camera: Detected OV2640 camera
I (3778) camera: Camera PID=0x26 VER=0x42 MIDL=0x7f MIDH=0xa2
I (3858) cam_hal: buffer_size: 32768, half_buffer_size: 4096, node_buffer_size: 2048, node_cnt: 16, total_cnt: 15
I (3858) cam_hal: Allocating 61440 Byte frame buffer in PSRAM
I (3868) cam_hal: cam config ok
I (3868) ov2640: Set PLL: clk_2x: 0, clk_div: 0, pclk_auto: 0, pclk_div: 8
I (5978) MAIN: Using SDMMC peripheral
Name: SMI
Type: SDHC/SDXC
Speed: 20 MHz
Size: luMB
CSD: ver=2, sector_size=512, capacity=7864320 read_bl_len=9
SSR: bus_width=1
I (6148) MAIN: Development-Branch: migrate-to-platformio-6.0.1 (Commit: 94371ba), Date/Time: 2023-02-11 17:44, Web UI: Development-Branch: migrate-to-platformio-6.0.1 (Commit: 94371ba)
I also installed the version (https://github.com/jomjol/AI-on-the-edge-device/actions/runs/4152463509) yesterday and the ping response times were dramatically reduced. My times correspond to those of @lterfloth.
The image build times of the GUI have not really changed but I could at least go through completely.
Yes, image build times remained more or less the same. Biggest difference is the package loss which was just not happening anymore (i.e. the system was stable). The package loss leads to many problems down the line.
@MonsterEnergy-wtf
I would like to join this conversation as I face the same issue. The webUI is that slow that I cannot use it at all. Not sure what is causing the problem but I will follow up this thread and check if I can help you in any way.
which pcb revision do you have? check the log for it.
The SD-Card content should not matter for the performance.
I built now a version (based on rolling) which specifically only supports rev3. I can't test it myself, so make sure you have USB access to revert if it does not work: https://github.com/jomjol/AI-on-the-edge-device/actions?query=branch%3Aset-min-version-to-rev3
As for the framework update: Since that at least for one of you might help, I want to follow this path one step deeper. I am trying to take the latest rolling and update the framework to
6.1.0
. (The above link still used6.0.1
and an older rolling). It builds now but for some reason fails to init PSRAM. I will let you know once I have a better version there.
With this the rev3 works somehow if you put CPU to 240 in the config file. With 160 I couldn't access it at all. However on RSSI -70 it is very slow, but I can still access it (rev1 works great even at RSSI -80). Close to the router with around -50 it works good, but not as fast as rev1.
@penapena Do both have an external antenna?
@ all: I feel that I am unable to trace it further down without a rev3 hardware myself. If somebody is willing to send me a device (to Switzerland), i can have a look on it, but I can't promise to fix it or spend a lot of time into it.
@MonsterEnergy-wtf
I would like to join this conversation as I face the same issue. The webUI is that slow that I cannot use it at all. Not sure what is causing the problem but I will follow up this thread and check if I can help you in any way.
which pcb revision do you have? check the log for it.
Rev3. Sorry, I should have added this before. Also I can confirm that the UI works much better with 240Mhz...
@penapena Do both have an external antenna?
@ all: I feel that I am unable to trace it further down without a rev3 hardware myself. If somebody is willing to send me a device (to Switzerland), i can have a look on it, but I can't promise to fix it or spend a lot of time into it.
Pls send me a PN (if possbile here). I'm willing to do so.
@caco3 is branch platformio6 already suitable for testing here at home? I mean, the device I am using right now does not work reliably anyway, therefore I would not be mad if it has issue. Just wondering whether the code needed for migration is already done. I'd be happy to test it. I don't mind a slow web UI and other usability related issues. As long as the device is able to read the values and send it to home assistant via MQTT, I'm happy :-)
@lterfloth No, sorry, somehow there was a change between the rolling
version migrate-to-platformio-6.0.1
is based on and the latest rolling. Since I based platformio6
on the latest rolling, this is an issue. The change somehow leads to that the PSRAM does not get initialized at all, thus making normal operation impossible.
I would need to rebase commit by commit to find where it broke, but I currently do not have time to look into it, sry.
@penapena
Pls send me a PN (if possbile here). I'm willing to do so.
Github does not provide such feature, but you find my contact information at https://www.ruinelli.ch/about
@penapena Do both have an external antenna?
@ all: I feel that I am unable to trace it further down without a rev3 hardware myself. If somebody is willing to send me a device (to Switzerland), i can have a look on it, but I can't promise to fix it or spend a lot of time into it.
Both are on internal antenna.
Both are on internal antenna.
You could try it with an external antenna
Both are on internal antenna.
You could try it with an external antenna
Already ruined one board with my awful soldering, so not going down that road.
Same issue here with Rev.3
Only worked with https://github.com/jomjol/AI-on-the-edge-device/actions/runs/4152463509 anything else is incredible slow or not useable.
I got now two more of the rev3 and they seem both to be working fine with the normal software v15.1.1. The CPU temp is also showing normal temps. The RSSI is a little worse with the internal antenna, but the web gui works as it should even with low RSSI. I'm going to RMA the first rev3 I got that was really slow.
I got now two more of the rev3 and they seem both to be working fine with the normal software v15.1.1. The CPU temp is also showing normal temps. The RSSI is a little worse with the internal antenna, but the web gui works as it should even with low RSSI. I'm going to RMA the first rev3 I got that was really slow.
That is really interesting. If you are right, it would mean that all those slow rev3 boards have a hardware bug. But then why do they work with older versions?
How much is the RSSI change?
Around 5-10 rssi. And you don't need to set cpu to 240. Didn't do any wider testing, but seemed like I got worse rssi with cpu at 240.
hmm, I don't think CPU clock and RSSI have any relation. And 5..10 are not yet statistically significant, meaning it also could just be by luck.
I've similar case. Web UI is so slow that cannot use whole project at all. Also tried this one, got a bit forward, and after updating cpu to 240MHz I was able to finish the setup.
@jupe You need to tell us which hardware revision you use. Also, at least for @penapena it seems to have been a hardware issue!
I've got the same Problem with 2 devices (both are rev 3).
nothing from the above mentioned worked for me.
In the log file the only error I get is:
W (14249) wifi:
hmm, I don't think CPU clock and RSSI have any relation. And 5..10 are not yet statistically significant, meaning it also could just be by luck.
Just for information: On two of my three rev.1 devices whenever I switch to 240Mhz Wifi stability (random disconntects) and responsiveness is so bad that they are just usable when distance to AP is max. 5m (even RSSI value is quite aceptable and not really lower than with 160Mhz). Only with 160Mhz the devices are responsive. I assume this is hardware related (maybe distubance of the antenna design).
I finally succeeded (after working on it for several evenings) to migrate rolling
to the latest PlatformIO, see https://github.com/jomjol/AI-on-the-edge-device/pull/2305
Please give the latest build in https://github.com/jomjol/AI-on-the-edge-device/actions/workflows/build.yaml?query=branch%3Aplatformio6 a try.
Note: Increasing the CPU clock to 240 MHz is not a valid solution! This only should be used if you want an extra fast device but are accepting a reduced stability (depends largely on your hardware quality which often is low if ordered directly in China -> hardware quality control not passed)!
Also please note that the functionality of the external LEDs still is not supported! is untested!
Feedback highly appreciated.
First and foremost, thank you for the effort you put into fixing this issue!
I wiped and tested two SD cards, tried two ESP32Cams, two cables and two power supplies (one was working with an ESP32 I use, too) to make sure that it is not a hardware related issue. Both units had the same issues as before... Signal strength is at -44dBm, so there should not be an issue. Pinging the devices resulted in packet loss similar to what I saw on the current stable release.
While fiddling around, I found out by pressing my finger on the parts below the camera (see image) instantly leads to better ping times and a responsive unit. I don't have a thermal camera, but I believe it is a thermal issue either with the esp32 chip itself or with one of the small fuses/transistors/whatever they may be under load. I do not know whether that information helps. This I only tested with one unit, though. But... maybe rev3's are just not powerful enough?
First and foremost, thank you for the effort you put into fixing this issue!
I wiped and tested two SD cards, tried two ESP32Cams, two cables and two power supplies (one was working with an ESP32 I use, too) to make sure that it is not a hardware related issue. Both units had the same issues as before... Signal strength is at -44dBm, so there should not be an issue. Pinging the devices resulted in packet loss similar to what I saw on the current stable release.
While fiddling around, I found out by pressing my finger on the parts below the camera (see image) instantly leads to better ping times and a responsive unit. I don't have a thermal camera, but I believe it is a thermal issue either with the esp32 chip itself or with one of the small fuses/transistors/whatever they may be under load. I do not know whether that information helps. This I only tested with one unit, though. But... maybe rev3's are just not powerful enough?
Hahahah, I can definitely confirm that. As soon I put my Finger on the marked area the Webinterface works like a charm. I couldn't believe it when I read it, but it's true...
Just to go even further with the shenanigans... I used some removable glue (Tesa Powerstrips) and put it on top of the area. On top of the glue, I glued some copper wires to disperse the heat. Looks ... but works. No package drops anymore. Testing now to see how reliable it works.
Wow, it sounds as if you where kidding...!
Usually, if touching it helps, it means that there are bad (solder) connections. But since it also helps with the tape indeed it is possibly a heating problem. That could easily be tested, eg, by putting it into the fridge and run it from there. Due to the closed door, the RSSI will be a bit lower, but I expect it to be ok. And the heating problems then should go away.
Nevertheless, this then clearly is a hardware issue and nothing we can fix with software! So best you can do is order a new device and hope it is more reliable.
You know I've been having that same slow webUI issue and, I'm not kidding, putting my finger over those parts helped immediately. They weren't particularly warm so I suspect it was just a bad solder job.
I am definitely making progress with version 6.1.0. I see an improvement in both the response times (ping) and the GUI page load.
I (1653) cpu_start: Application information:
I (1653) cpu_start: Project name: AI-on-the-edge
I (1658) cpu_start: App version: 2948d6c
I (1662) cpu_start: Compile time: Apr 17 2023 21:39:37
I (1667) cpu_start: ELF file SHA256: 469034712c0e01a3...
I (1672) cpu_start: ESP-IDF: 5.0.1
I (1676) cpu_start: Min chip rev: v0.0
I (1680) cpu_start: Max chip rev: v3.99
I (1684) cpu_start: Chip rev: v1.0
I (6828) MAIN: Development-Branch: HEAD (Commit: 2948d6c), Date/Time: 2023-04-17 21:39, Web UI: Development-Branch: HEAD (Commit: 2948d6c)
I (6858) MAIN: Reset reason: Power-on event (or reset button)
@HolBaum5 Das ist super! Bitte teste doch auch mal den neusten Build mit 6.1.0. (siehe mein letzter Eintrag hier) .Das basiert auf rolling und das möchte ich gerne nächstens ins rolling mergen.
@caco3 Ist das nicht der letzte Stand (AI-on-the-edge-devicemanual-setup2305merge(2948d6c))? Hatte mich verschrieben, 6.0.1 sollte aber 6.1.0 sein.
doch, du hast Recht, sorry.
Ist das nun bereits in der Testphase? Soll ich die SW auf mein Testsystem laden?
Ich hab das Release auf mein zusammengepfuschtes (i.a.W. thermisch optimiertes) System gespielt. Läuft bis dato reibungslos und einwandfrei in Einklang mit Home Assistant und Datentransfer via MQTT.
The Problem
I can answer all questions with "yes". The PSRAM is ESP PSRAM64H 462021 / 1B00286, it's an ESP32CAM Module("diymore ESP32 CAM Entwicklungsplatine, WLAN/Bluetooth, ESP32 DC 5V Dual-Core-Entwicklungsplatine mit 2640 Kamera-TF-Karten-Modul").
I use a 16GB SD Card. I flash using the webinstaller & Chrome. I tried holding IO0 and not holding it while flashing. I tried two units, both have the same issues (I bought a two pack).
What happens: Flashing works fine, but the web interface is stupidly slow. The ESP is beside my Macbook so it can't be the AP signal. I tried uploading remote.zip which most of the time does not work fails after a while. I attached a log file. I also tried manually uploading the config to the SD card. That got me a step further, yet the webinterface gets unresponsive when creating a reference image using the pre-configured SD card.
Version
15.1.1
Logfile
Expected Behavior
No response
Screenshots
No response
Additional Context
No response