universam1 / iSpindel

electronic Hydrometer
http://www.ispindel.de
Other
826 stars 324 forks source link

iSpindel hanging #399

Closed RonnyRusten closed 3 years ago

RonnyRusten commented 4 years ago

Can anyone tell me what is going on? After several days working just fine, my iSpindel suddenly hangs with the LED on the D1 constantly on. I have used this iSpindel to log one brew before. 14 days in the fermenter, reporting every 15 minutes to Brewfather with no errors. I recharged and threw it in the next brew 11 days ago, and it was working fine for a day or so, when I noticed it didn't report to Brewfather. I opened the fermenter bucket and picked it out, and noticed the LED was constantly on, so I shut it down and left it on my bench for a day or so before I tried to turn it back on. It started working again, but today it failed again. I hooked it up to VS Code and started a monitor, and this is what the output is (I have hidden some details...):

FW 6.2.0
2.2.1(cfd48f3)
Worker run!
mounting FS... mounted!
reading config file
parsed config:
{"Name":"iSpindel001","Token":"****","Sleep":900,"Server":"log.brewfather.net","API":3,"Port":80,"Channel":0,"URI":"/ispindel?id=****","DB":"ispindel","Username":"","Password":"","Job":"ispindel","Instance":"000","Vfact":191.8,"TS":0,"OWpin":-1,"POLY":"-0.006112676*tilt^2 + 1.011909099*tilt - 18.04054636","SSID":"*****","PSK":"*******","Offset":[-1706,-248,1444,398,-10,-64]}
scanning for OW device on pin: 5
No devices found!
scanning for OW device on pin: 12
Found device with ROM = 28 51 AF 79 A2 0 3 3B
  Chip = DS18B20
  Data = 1 A9 1 55 5 7F A5 A5 66 21  CRC=21
  Temperature = 26.56 Celsius,
Acc Test Connection ERROR!
applying offsets: -1706:-248:1444
confirming offsets: 0:0:0
Boot-Mode: Deep-Sleep Wake

woken from deepsleep, normal mode

What is supposed to be on pin 5? The gyro? I have had some issues with the gyros, and I have some more available, but I wanted to ask here before I start soldering...

Ronny

neilbaldwin commented 4 years ago

Hmmm. Odd messages. From the schematic (on the OpenSourceBrewing Jeffrey board at least), Pin 5 (of the WEMOS) is connected to the temperature sensor and Pin 12 is connected to the gyro. But you're getting a connection failure on both it seems, but then it seems to connect as you get a temperature and gyro reading!

I've just built a batch of 5 and out of the first lot of (cheap Ebay clone) WEMOS boards, I had two that were faulty or failed. I'd suspect the WEMOS first but it's just a guess.

This is why I ended up making a test rig by breadboarding the circuit (it's pretty easy) so that I could just solder pins onto all the PCB and connect them to the breadboard circuit to test before soldering to the iSpindel PCB.

RonnyRusten commented 4 years ago

Thank you for answering! Smart thing to breadboard it before soldering, I might do that too. In this case I would'nt have found the error, though. As I wrote, this iSpindel har been working for months (I know I wrote days, but it's actually months, I had it on my bench to test the batteries) before it suddenly started to misbehave. I will try to change the wemos and/or the gyro. I have had several gyros fail (or is it actually the wemos that is failing?), so breadboarding is definitely a smart thing to do. I don't use the Jeffrey, but I guess the schematics are the same? I have a bord called "something" 4.0 (can't remember who made it), I believe OpenSourceBrewing has used that as a template?

neilbaldwin commented 4 years ago

No worries. Honestly, breadboarding it saves so much headaches especially if you're building a few. I appreciate though in this instance it might not have shown your issue. I would definitely suspect the WEMOS over the other boards based on my own experience. I've built 6 of the Jeffery ones now and the only boards I've ever had fail are the WEMOS. So much poor quality clones out there.

Other than that, often an issue that shows up late like that could be down to a dry or poor solder joint. Maybe reflow all of your joints and retest first as I know desoldering stuff is a pain in the backside (hence the breadboard! :)

RonnyRusten commented 4 years ago

Yes, I will try reflowing first, often reflowing is the best way to desolder anyway. If I find that the Wemos'es I have are failing, I will consider buying "proper" ones from a local store, I got mine from AliExpress, so they are most certainly clones. Probably OK for prototyping, but not good enough for "production".

eamonnjh commented 4 years ago

Looking at the uart messages, I think this is your problem: fake DS18B20 sensors. If you bought them from ebay or AliX, there is a good chance you got a fake. I got 3 very nicely made sensors from Ali recently that all failed a few hours after power-up. These days, I only buy them from trustworthy suppliers: Digikey, Mouser, Farnell etc. Here's a great write-up on the topic:

https://github.com/cpetrich/counterfeit_DS18B20

Severin9er commented 4 years ago

I've had the same issue for quite some time, but I might have found the problem finally. So the last line printed in the console is

woken from deepsleep, normal mode

That gets printed at the end of shouldStartConfig() function. This function just gets called once in the setup() and obviously returns false, so we can skip the whole if-clause and jump directly to Line 1177. The next line, where a console print happens is Line 1244 and when - like in my case - the #ifndef is sharp we can skip the whole #else part, so finally there are not many lines left, where the problem can occour.

I took a look into each function in between and finally found one place where the controller is able to get stuck: In function getTilt() in Line 878 there is a blocking while-loop:

while (!accelgyro.getIntDataReadyStatus())
   delay(2);

If the accelgyro.getIntDataReadyStatus() never returns true, this while-loop will never end. I don't know the reason, why the accelgyro sometimes doesn't respond, but it seems like this happens to me sometimes.

In my case it helped to simply add a timeout and just run the rest of the for()-content just in case the timeout isn't equal to 0.

int timeout = 500;
while (!accelgyro.getIntDataReadyStatus() && timeout--)
   delay(2);
if (timeout){
   getAccSample();
   float _tilt = calculateTilt();
    samples.add(_tilt);

    if (i >= MEDIANROUNDSMIN && isDS18B20ready())
      break;
}

I've also implemented some code to don't send the values when all readouts within for() fail, so when there is no data within samples.

If you want, I can create a Pull-Request with my solution. But probably it should be discussed how long the timeout should take and if it really makes sense to not send any values in that case or send all values except the Tilt-value...

RonnyRusten commented 4 years ago

Nice work. For me, the best thing is to report an error, instead of just "hanging". To send values other than the tilt is, again, for me, uninteresting since it's the tilt that is interesting. Temperature is probably also interesting for some, but I use BrewPiLess for temperature readings, so for me temperature from the iSpindel is not important. I will try to change the gyro and see if that fixes the error.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.