letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.28k stars 2.22k forks source link

ESP32 i2c stops working #1282

Closed waspie closed 5 years ago

waspie commented 6 years ago

ESP32

I'm still working on what conditions this happens. For sure it happens with a TSL2561 connected. I don't think i'm seeing the same behavior with just 2 MCP23017s connected but I need to disconnect the TSL2561 when I get home and see if it still happens but I'm pretty sure the TSL2561 is the cause. The particular piece of hardware has been working reliably for months on end so I don't think its the TSL2561 itself but something software/esp32 related.

I've used the precompiled binary from late march and my own compiled versions and the behavior is the same.

With my 3 i2c devices connected they work for a couple minutes (at best) but soon after they all disappear and an i2c scan shows no devices.

On my own compiled versions i'm running just 6 plugins.

Build 20100 - Mega32
GIT version (custom)
Plugins 6 [Normal]
Build Md5 4d44355f4d44355f4d44355f4d44355f
Md5 check fail !
Build time Apr 18 2018 23:03:07 <----built from April 2 source
Binary filename ThisIsTheDummyPlaceHolderForTheBinaryFilename...

ESP board ESP Chip ID |   Storage Flash Chip ID |   Flash Chip Real Size: | 4096 kB Flash IDE Size: | 4096 kB Flash Writes | 0 daily / 0 boot Sketch Size

i2c is on pin 4/5 (default) No I2C devices found

plugins

TD-er commented 6 years ago

I have running the oled display for about 18h now on the esp32. Have you tried adding pull up resistors to both I2C lines? 10k each

waspie commented 6 years ago

no, i have not. i've never seen this as being needed anywhere. do you have more information on why you do this?

Budman1758 commented 6 years ago

I have the same problem on a test unit also. The I2C stops working after what seems to be a random amount of time. I have an OLED display and an MCP23017 connected. I have seen anywhere from 800 to about 1400 minutes uptime until it quits. When this happens I have the same results in the I2C scanner page. No I2C devices found. This is the same unit that I reported issue #1029 with. There is also a Dallas temp sensor connected but it is not affected as far as I can tell.

Budman1758 commented 6 years ago

I have resistors on my setup.

TD-er commented 6 years ago

The devices pull the signal to ground to communicate, so in order to get a good signal quality, you may add pull up resistors to the signal lines. Maybe the esp32 does not do internal pull up like the esp8266 or other Arduino chips do.

It is advisable to add these when using more I2C devices on the same bus and some chips need it more than others. Also when using longer signal lines.

waspie commented 6 years ago

ok, i will try that. it was never an issue with all the 8266s i've hooked up so i had no idea!

TD-er commented 6 years ago

Pull up between I2C line and 3v3 line. Just to make sure :)

waspie commented 6 years ago

I unhooked the TSL and the MCPs stayed working for a few hours. I just now hooked up the 10ks and plugged the TSL back in and immediately both MCPs and TSL are visible in the scan which hasn't been the case previously. cautiously optimistic at the moment. I'll check back in on this topic tomorrow.

Guess I'll have to make a few more changes to my PCBs

waspie commented 6 years ago

spoke too soon. drat

I can't do the coding or anything, far above my ability, but is there something I can do to help resolve this? The logs don't give any hints as to why it stops, not on serial or anything

ferazambuja commented 6 years ago

I tested with two different kinds of boards and the I2C scanner reads the wrong addresses. I'm using a Sparkfun ESP32 Thing and a Wemos Lolin32. How do I build for ESP32 with more plugins (like dev or test)?

waspie commented 6 years ago

I may be way off base here, but while searching around for pullup resistor values on the ESP32 i2c lines I found this topic: https://esp32.com/viewtopic.php?f=13&t=1308&sid=8429201731480ecffca91d3ab827d59a&start=10

and it seemed like useful information.

TD-er commented 6 years ago

Doesn't have to be that far-fetched. The suggestion of the pull-ups was more to make sure it wasn't some signal quality issue. It may very well be related to timing, which may affect some sensors more than others.

waspie commented 6 years ago

i modified the freertos definition from 1000 to 100 which was a bad idea. everything ran much faster, so i instead modified I2CDEV_DEFAULT_READ_TIMEOUT from 1000 to 100 in i2cdev.h and uploaded it an esp32 on my desk at work and the TSL has been working continuously for about 20 minutes with no PU resistors (didnt think to bring any to work) which is far better luck than i've had thus far.

i wonder if the problem is we're not using enough pull up (i see a lot of references to using 4.7k). so at 10k the pull up is too lazy still but 4.7k is much faster (or modify the timeout). modifying the timeout would affect other things you compile with i2c though so the resistors would seem like the better option?

TD-er commented 6 years ago

10k pull up is indeed a bit conservative. But compared to having none, it is already an improvement. You also don't want to have too strong pull up, since the sensors and the ESP must still be able to pull it down to GND. That "10k" was just a "better than nothing, if it's not working at least it doesn't harm anything". If you need 4k7, please do and let's hope that makes it operate more stable.

waspie commented 6 years ago

Yeah, I'll swap for 4.7k tonight and see what happens.

35 minutes on my desk, no PUR, still working.

TD-er commented 6 years ago

If you have multiple 10k's you could place 2 parallel to get 5k ;) I guess the data line is the most sensitive, so you could set both 10k's to the data line.

waspie commented 6 years ago

can't do anything about it at the moment. the only esp32 im running at home is at home :P

the one on my desk is still going, 101 minutes using the timeout change. i managed to VPN and OTA the same firmware to the one at home. I'll see how long it keeps its devices.

waspie commented 6 years ago

also found this. i'm sure there's a lot more and your google works as well as mine so i'll try not to keep bugging you too much. i dont know a lot of what they're saying but maybe you(? or whoever?) can make sense of what they did to fix it? https://github.com/espressif/esp-idf/issues/680

the last post of this related issue https://github.com/espressif/esp-idf/issues/922 makes it sound like there's a function i2c_master_clear_bus(). that resets the bus when it dies.

waspie commented 6 years ago

further down the rabbit hole... i found this https://github.com/stickbreaker/arduino-esp32/commit/798e73b9b3665a93157f0abd35763a3c85ac7a88

its referenced in a very long thread about i2c bus hanging but this time there is a file changed that i actually see on my system! i'm going to replace the current file with this and retest.

waspie commented 6 years ago

tried with 4.7k PUR. tried with modified i2c hal c file. doesn't matter, they all stop working. kind of at the end of my ability for the moment.

TD-er commented 6 years ago

Can you monitor the memory usage? My ESP32 board was crashed (completely frozen) and in the Domoticz log, I could see the free memory was quite a bit less than before. The same setup on an ESP8266 gave a nice flatline (no trend) with the occasional dip or peak, which is perfectly fine, which indicates there is no leak in our part of the software, so it must be in the core library.

waspie commented 6 years ago

I'm once again sticking my foot in my mouth. The changes are necessary in several more files. You're supposed to replace the esp32 core with this guy's entire rewrite. I tried it earlier but my esp became rather slow, albeit I kept seeing measurements come across mqtt. I rebuilt it all again, fresh, tonight and it's been responsive and working for 3 hours already which is far better than before. I was usually lucky to make it 30 minutes.

I'll check in later

waspie commented 6 years ago

12 hours, still working. Previously 20 minutes was a milestone.

TD-er commented 6 years ago

Can you explain what you did, or post a link to instructions what to do?

waspie commented 6 years ago

I fully intend to when I'm satisfied that it has resolved the issue, although i shouldn't keep it to myself right now b/c it's still working. so, the "50,000 foot view" is you need to replace the entire esp32 build environment with this one: https://github.com/stickbreaker/arduino-esp32

I don't use platformIO (i know you do so hopefully you know what to do in place of what i'm saying).

if you're using arduino ide and you've already installed the esp32 environment then you know your build environment is something like c:\users\YOURUSERNAME\documents\arduino\hardware\espressif\esp32

clone the repo listed above and OVERWRITE the contents of your esp32 folder with this repo. the writer has made other changes that I didn't look it so i don't know what they are but they don't affect ESPeasy.

If you're using a "portable" install of arduino then your folder is just c:\whatever???\arduino\hardware\espressif.

Again, I'll provide more detailed instructions with pictures etc later. Right now I'm house cleaning. I know this is not an ESPEasy issue but surely the users here will benefit from knowing how resolve i2c issues if they're going to use an ESP32.

btw, 14 hours and still working

waspie commented 6 years ago

Do you still want me to start logging memory usage?

TD-er commented 6 years ago

Well it may help in determine the cause of the next crash :) Some report crashes after N minutes and my ESP32 does seem to crash after N hours, probably due to memory issues. Although I am not entirely sure what to monitor, since the free reported memory may be quite high, but allocations on the heap may fail due to heap fragmentation.

waspie commented 6 years ago

I don't know where you want to post this or what editing you want to do...

These instructions are for a Windows machine. OS X and Linux will be different (naturally). If you're using linux you ought to be clever enough to adapt these instructions to your needs.

What is this fix and why do I need it? If you use the ESP32 and are using i2c devices, you need this fix. Why? Because espressif needs to fix their code regarding i2c devices releasing the bus after information is transmitted. At some point espressif will fix it or it may get merged from this guy's fixes. What do you need to do to "fix" it? Download and overwrite your existing arduino ide build environment for the esp32 with this code. Go to: https://github.com/stickbreaker/arduino-esp32 and clone the archive (download as zip, whatever).

If you're using the arduino ide as a "portable" install your esp32 build environment will be c:\whateverfolderarduinoisin\hardware\espressif\esp32 snip1

The archive that you cloned above will have this same folder structure. Extract those files overwriting the files in the above directory in this example.

If you installed arduino your folder should be c:\users\YOURUSERNAME\documents\arduino\hardware\espressif\esp32 (i think, i don't have it "installed" so I don't know off the top of my head for sure).

That's it. Assuming you previously followed the instructions to installed the esp32 environment you're all set. Rebuild your firmware and you'll have no more i2c breakages.

waspie commented 6 years ago

This is not an ESPEasy problem. Solved with different esp32 core

Budman1758 commented 6 years ago

@waspie Any chance you could make a bin file I could download? I do a lot of testing\jacking around with this stuff but don't have a clue how to compile ESP32.

waspie commented 6 years ago

Compiled from source downloaded as of 4 24 2018 7:30A EST

ESPEasy.ino.esp32.zip

minus these plugins due to compilation errors (missing libraries on my end) plugs

Budman1758 commented 6 years ago

Thanks a bunch!!

Budman1758 commented 6 years ago

@waspie Are you having any problems with the version in your bin file? I keep getting a crash and reboot when I try to access device 1. Pretty consistent. I also had a bit of difficulty getting it flashed at first. What do you use to flash your 32's?

waspie commented 6 years ago

Anything passed April 3rd? Ish seems to have this instability. I'm personally running April 2nd source. If you'd rather have bin from that source let me know. I use flash serial script or esptool

On Apr 24, 2018 4:46 PM, "Carl Forster" notifications@github.com wrote:

@waspie https://github.com/waspie Are you having any problems with the version in your bin file? I keep getting a crash and reboot when I try to access device 1. Pretty consistent. I also had a bit of difficulty getting it flashed at first. What do you use to flash your 32's?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/letscontrolit/ESPEasy/issues/1282#issuecomment-384074398, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah7kRTqRwQbqocX4aPFqfUsO5JBQRxMqks5tr48xgaJpZM4TcCTC .

Budman1758 commented 6 years ago

@waspie A bin that you are using would be great. An example of your command line for esptool would be pretty awesome too. :>)

waspie commented 6 years ago

Yeah hang tight until about 8est. Home w kids now

On Apr 24, 2018 5:02 PM, "Carl Forster" notifications@github.com wrote:

@waspie https://github.com/waspie A bin that you are using would be great. An example of your command line for esptool would be pretty awesome too. :>)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/letscontrolit/ESPEasy/issues/1282#issuecomment-384078721, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah7kRZKL7bZTdmyio4vZRBU8KXHGn_0eks5tr5LNgaJpZM4TcCTC .

waspie commented 6 years ago

i use the flashserial.cmd that came with one of the releases?? plop it in the same folder as esptool.exe use the bin file as named in the same folder, then run flashserial.cmd. it asks which com port (use just the ##) then which release type "20100" that should do it. this is april 2nd source with most plugins. missing 10 16 35 49 52 65 mostly the same as before. all others included, all controllers all notis.

it still crashes occasionally when adding devices but probably 95% less than newer builds. allow it to reboot and reload the page or go to the next device (if configuring task 5, try configuring 6 instead... its goofy but whatever. once its configured and working who cares?). mine has about 48 hours uptime, would have been longer but i pulled the plug accidentally. all i2c devices still work. i included flashserial.cmd in case you dont have it. good luck ESP32flash.zip

Budman1758 commented 6 years ago

Thanks for the bin file. Don't know why but it crashes about as much as the other. I think I'm gonna set my 32's aside for a bit until the code is a bit more mature. Need to concentrate on getting a bunch of other stuff already built deployed.

waspie commented 6 years ago

@Budman1758 what are you doing to cause the crashes? what controller do you use and what plugins do you need? I compile mine with only OH controller and about 6 plugins.

also, to be safe you may want to erase_flash on the ESP32 before uploading. You don't need to do it every time but if you're having a lot of problems, now might be the time to do it.

esptool --chip esp32 --port COM?? erase_flash then flash firmware

Budman1758 commented 6 years ago

@waspie The crashes are happening right away after flashing. I'm just navigating around the web pages. It crashes 99% of the time just clicking on the edit button for task 1 from the devices main page. Also crashes on other task edit buttons but not as much. I don't have any devices set up yet. No controllers. Example of the log.......

`60355 : WD : Uptime 6 ConnectFailures 0 FreeMem 107648 390355 : WD : Uptime 7 ConnectFailures 0 FreeMem 107648 415682 : EVENT: Clock#Time=Tue,13:39 420355 : WD : Uptime 7 ConnectFailures 0 FreeMem 107648 450355 : WD : Uptime 8 ConnectFailures 0 FreeMem 107648 475682 : EVENT: Clock#Time=Tue,13:40 480355 : WD : Uptime 8 ConnectFailures 0 FreeMem 107648 510355 : WD : Uptime 9 ConnectFailures 0 FreeMem 107648 535682 : EVENT: Clock#Time=Tue,13:41 540355 : WD : Uptime 9 ConnectFailures 0 FreeMem 107648 570355 : WD : Uptime 10 ConnectFailures 0 FreeMem 107648 ERROR A stack overflow in task loopTask has been detected. <<<<<< Every time! Click task 1 edit.

Backtrace: 0x4008929c:0x3ffd48a0 0x4008939b:0x3ffd48c0 0x400893b4:0x3ffd48e0 0x40086b2f:0x3ffd4900 0x400882a8:0x3ffd4920 0x4008825e:0x00000000

Rebooting... ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0010,len:4 load:0x3fff0014,len:812 load:0x40078000,len:0 load:0x40078000,len:10164 entry 0x400789f8 ▒U31 :

INIT : Booting version: (custom) (ESP32 SDK v3.1-dev-239-g1c3dd23f-dirty) 31 : INIT : Cold Boot 32 : FS : Mounting... 59 : CRC : No program memory checksum found. Check output of crc2.py 85 : CRC : SecuritySettings CRC ...OK 214 : INIT : Free RAM:126472 214 : INIT : I2C 215 : INIT : SPI not enabled 221 : INFO : Plugins: 42 [Normal] (ESP32 SDK v3.1-dev-239-g1c3dd23f-dirty) 221 : EVENT: System#Wake 223 : WIFI : Switch on WiFi 224 : WIFI : Set WiFi to STA 254 : OTA : Arduino OTA enabled on port 8266 364 : EVENT: System#Boot 369 : WD : Uptime 0 ConnectFailures 0 FreeMem 114332`

I have erased the flash b4 firmware flash too. I just don't want to spend too much time hooking a bunch of stuff up until its a bit more stable. Don't get me wrong..... not giving up..... I have at least 12 ESP32 chips..... just taking a pause for a bit. :>)

Budman1758 commented 6 years ago

@waspie Do you happen to use the framed OLED? Was wondering if the I2C fix fixes this issue. #1029 ?

ferazambuja commented 6 years ago

Mine also crashes after a few clicks around the web interface. I also don’t have anything hookup. I’m using the compiling formula available on platform IO. I will try to get a log later.

waspie commented 6 years ago

@Budman1758 , i would imagine.

Like I said, if you tell me what plugins (numbers) and controller I'll whip up another bin for you to try.

I was using a different laptop last night, it's possible the build environment wasn't right.

You may still get an occasional crash on a working version of my bin but it shouldn't be the norm.

Budman1758 commented 6 years ago

@waspie The plugins in the second bin you made are fine. The seven segment display would be nice if possible. P073 I don't use a controller right now but if I did it would probably be Domotiz.

waspie commented 6 years ago

@Budman1758 @ferazambuja

no notis, just domo and oh controllers and plugins in the attached picture. image

ESPEasy32_R20100.zip

TD-er commented 6 years ago

I have the oled framed plugin running on my ESP32 and it works just fine. I can upload a binary version later this evening.

Budman1758 commented 6 years ago

I'll give it a spin later. Thanks.

Budman1758 commented 6 years ago

@TD-er That would be awesome!

waspie commented 6 years ago

@Budman1758

I missed that you added p073. added and attached. hopefully one of these bins will work for you.

ESPEasy32_R20100.zip

Budman1758 commented 6 years ago

@waspie Thanks again!! Gonna give it a spin here in a couple...