rbaron / b-parasite

🌱💧 An open source DIY soil moisture sensor
1.93k stars 149 forks source link

Consumption strage behavior #209

Open azgael opened 4 months ago

azgael commented 4 months ago

Dear rbaron Thank you again for this excellent project. Still I have been having serious trouble to have a stable long term b-parasite. I have ordered from jbl pcb 2 set of boards, the first one (10units) I fully populated it using the radio Nrf52833 module, I got only 2 working at the end, I have reviewed all soldering and process and have no idea why this was happening (i have experience soldering delicate stuff, the humidity and temperature sensors is hard but I can manage it with good equipment). my first idea was that i was having problems with the pcb vias (never use pcb vias on pads!!!), so I have redesigned the boards only to move out the vias from the pads, i managed well, so I order the second lot but to be on the safe side i order also the population of the boards, so I only need to solder the rf modules and the battery holder. with the second lot i built 10units using the nrf52840 since I had so much trouble with the first ones i just wanted to make sure using the best radio also. the problem now is that ou of this last 10 units built i got only 7 working out (others the programed the led does not blink on one, and the other 2 blinks constantly, I guess high consumption resets the board constantly) but still the power consumption seams extreme on some of the boards, further more i got very strange behavior regarding the voltage reading, bparasitevoltage

seams that only 3 out of 7 are minimally usable (still high voltage drop for the time frame)

to be frank I am a bit lost here, so I just wonder is others are having similar trouble?

would you be so kind to let me know a feasible debug strategy that i could make and share here?

with my bast regards, azgael

rbaron commented 4 months ago

Hey @azgael, that battery consumption is indeed very strange. My first though is that either there is a short, or the boards are stuck in a rebooting loop, draining the battery.

It's difficult to debug hardware issues from far, specially since you modified the pcb yourself from what I understood. First step would be to get some logs from the working boards and see if they tell you something. If that doesn't work, please send a picture of the the boards you soldered the components on. It's a long shot, but maybe we can spot something.

azgael commented 3 months ago

Hi @rbaron from the logs i don't find anything relevant, also when i plug the debugger i end up always resting the board so any previous log is gone. As am example the board i am currently debugging shows the following behavior: 2024-06-05 14_30_15-Window

the log is the following:

SEGGER J-Link V7.96h - Real time terminal output SEGGER J-Link (unknown) V1.0, SN=801047073 Process: JLink.exe Booting nRF Connect SDK v2.5.2 [00:00:00.000,518] bt_sdc_hci_driver: SoftDevice Controller build revision: 8d dc 02 cb 49 26 c6 47 1a 6a 3d 4e 2f d1 a4 f2 |....I&.G .j=N/... 2e 38 1b 75 |.8.u
[00:00:00.003,479] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002) [00:00:00.003,509] bt_hci_core: HW Variant: nRF52x (0x0002) [00:00:00.003,540] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 141.732 Build 3324398027 [00:00:00.004,669] bt_hci_core: Identity: C4:C0:5E:39:91:FC (random) [00:00:00.004,699] bt_hci_core: HCI: version 5.4 (0x0d) revision 0x1168, manufacturer 0x0059 [00:00:00.004,730] bt_hci_core: LMP: version 5.4 (0x0d) subver 0x1168 [00:00:00.004,760] ble: MAC Address: c4:c0:5e:39:91:fc

seams normal

So I have compiled the firmware once again with the debug log flag and reprogram it:

''' '# Application config - see all options in Kconfig. '# CONFIG_PRST_BLE_ENCODING_BTHOME_V2=y '# CONFIG_PRST_SLEEP_DURATION_SEC=1

'# prstlib config - ser all options in prstlib/Kconfig. 'CONFIG_PRSTLIB_LOG_LEVEL_DBG=y '''

the log now is the following:

''' Booting nRF Connect SDK v2.5.2 [00:00:00.000,457] adc: prst_adc_init: Dry coeff 0: 334000

[00:00:00.000,488] adc: prst_adc_init: Dry coeff 1: 110000

[00:00:00.000,518] adc: prst_adc_init: Dry coeff 2: -15300

[00:00:00.000,518] adc: prst_adc_init: Wet coeff 0: 299000

[00:00:00.000,549] adc: prst_adc_init: Wet coeff 1: -83100

[00:00:00.000,579] adc: prst_adc_init: Wet coeff 2: 11200

[00:00:00.000,823] bt_sdc_hci_driver: SoftDevice Controller build revision: 8d dc 02 cb 49 26 c6 47 1a 6a 3d 4e 2f d1 a4 f2 |....I&.G .j=N/... 2e 38 1b 75 |.8.u
[00:00:00.003,967] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002) [00:00:00.003,997] bt_hci_core: HW Variant: nRF52x (0x0002) [00:00:00.004,028] bt_hci_core: Firmwa ''' Stops here really! nothing else out yet still on reset it updates the HA (voltage drop but i have the debugger connected) funny is that the HA continues to update.

Regarding the soldering on the boards, please find the picture attached, looks quite clean for me.

IMG_7939 IMG_7942 IMG_7943 IMG_7941 IMG_7940

About the redesign of the board, I have just change a bit track wide and via and component position very minor changes, i believe it does not come from there since some boards seam to work fine as a couple of the previous lot made by your design.

Going back to the Logs: is there a way of enable more information besides that flag.

Thanks again azgael

azgael commented 3 months ago

Now the logs with the refresh at 30 sec instead of 10 min :|

Booting nRF Connect SDK v2.5.2 [00:00:00.000,427] adc: prst_adc_init: Dry coeff 0: 334000

[00:00:00.000,457] adc: prst_adc_init: Dry coeff 1: 110000

[00:00:00.000,488] adc: prst_adc_init: Dry coeff 2: -15300

[00:00:00.000,488] adc: prst_adc_init: Wet coeff 0: 299000

[00:00:00.000,518] adc: prst_adc_init: Wet coeff 1: -83100

[00:00:00.000,549] adc: prst_adc_init: Wet coeff 2: 11200

[00:00:00.000,762] bt_sdc_hci_driver: SoftDevice Controller build revision: 8d dc 02 cb 49 26 c6 47 1a 6a 3d 4e 2f d1 a4 f2 |....I&.G .j=N/... 2e 38 1b 75 |.8.u
[00:00:00.003,875] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002) [00:00:00.003,906] bt_hci_core: HW Variant: nRF52x (0x0002) [00:00:00.003,936] bt_hci_core: Firmwa

[00:00:31.901,275] adc: get_soil_moisture_percent: Read soil moisture 2: -2.27 | Raw 535 | Batt: 3.01 | Dry: 526.46 | Wet: 150.36 [00:00:31.901,367] adc: get_soil_moisture_percent: Read soil moisture 2: -2.27 | Raw 535 | Batt: 3.01 | Dry: 526.46 | Wet: 150.36 [00:00:31.911,529] adc: prst_adc_photo_read: Read phototransistor: 100 lx | 0.02 V [00:00:31.934,600] shtc3: prst_shtc3_read: Read temp: 25.348740 oC (25) [00:00:31.934,631] shtc3: prst_shtc3_read: Read humi: 45 % [00:00:31.934,631] sensors: prst_sensors_read_all: Batt: 3009 mV (100.00%) [00:00:31.934,661] sensors: prst_sensors_read_all: Soil: 0 % (0 mV) [00:00:31.934,692] sensors: prst_sensors_read_all: Photo: 100 lx (0 mV) [00:00:31.934,692] sensors: prst_sensors_read_all: Temp: 25.348740 oC [00:00:31.934,722] sensors: prst_sensors_read_all: Humi: 45 % [00:00:31.934,753] sensors: prst_sensors_read_all: -------------------------------------------------- [00:01:02.966,339] adc: get_soil_moisture_percent: Read soil moisture 2: -2.59 | Raw 535 | Batt: 2.95 | Dry: 525.33 | Wet: 151.34 [00:01:02.966,400] adc: get_soil_moisture_percent: Read soil moisture 2: -2.59 | Raw 535 | Batt: 2.95 | Dry: 525.33 | Wet: 151.34 [00:01:02.976,623] adc: prst_adc_photo_read: Read phototransistor: 82 lx | 0.01 V [00:01:02.999,694] shtc3: prst_shtc3_read: Read temp: 25.367432 oC (25) [00:01:02.999,725] shtc3: prst_shtc3_read: Read humi: 44 % [00:01:02.999,725] sensors: prst_sensors_read_all: Batt: 2949 mV (70.42%) [00:01:02.999,755] sensors: prst_sensors_read_all: Soil: 0 % (0 mV) [00:01:02.999,786] sensors: prst_sensors_read_all: Photo: 82 lx (0 mV) [00:01:02.999,786] sensors: prst_sensors_read_all: Temp: 25.367432 oC [00:01:02.999,816] sensors: prst_sensors_read_all: Humi: 44 % [00:01:02.999,847] sensors: prst_sensors_read_all: -------------------------------------------------- [00:01:34.031,402] adc: get_soil_moisture_percent: Read soil moisture 2: -2.26 | Raw 535 | Batt: 3.01 | Dry: 526.52 | Wet: 150.31 [00:01:34.031,463] adc: get_soil_moisture_percent: Read soil moisture 2: -2.26 | Raw 535 | Batt: 3.01 | Dry: 526.52 | Wet: 150.31 [00:01:34.041,687] adc: prst_adc_photo_read: Read phototransistor: 142 lx | 0.02 V [00:01:34.064,758] shtc3: prst_shtc3_read: Read temp: 25.364761 oC (25) [00:01:34.064,788] shtc3: prst_shtc3_read: Read humi: 43 % [00:01:34.064,788] sensors: prst_sensors_read_all: Batt: 3012 mV (100.00%) [00:01:34.064,819] sensors: prst_sensors_read_all: Soil: 0 % (0 mV) [00:01:34.064,849] sensors: prst_sensors_read_all: Photo: 142 lx (0 mV) [00:01:34.064,849] sensors: prst_sensors_read_all: Temp: 25.364761 oC [00:01:34.064,880] sensors: prst_sensors_read_all: Humi: 43 % [00:01:34.064,910] sensors: prst_sensors_read_all: -------------------------------------------------- [00:02:05.096,466] adc: get_soil_moisture_percent: Read soil moisture 2: -2.30 | Raw 535 | Batt: 3.00 | Dry: 526.34 | Wet: 150.47 [00:02:05.096,527] adc: get_soil_moisture_percent: Read soil moisture 2: -2.30 | Raw 535 | Batt: 3.00 | Dry: 526.34 | Wet: 150.47 [00:02:05.106,750] adc: prst_adc_photo_read: Read phototransistor: 100 lx | 0.02 V [00:02:05.129,821] shtc3: prst_shtc3_read: Read temp: 25.407486 oC (25) [00:02:05.129,852] shtc3: prst_shtc3_read: Read humi: 43 % [00:02:05.129,852] sensors: prst_sensors_read_all: Batt: 3002 mV (100.00%) [00:02:05.129,882] sensors: prst_sensors_read_all: Soil: 0 % (0 mV) [00:02:05.129,913] sensors: prst_sensors_read_all: Photo: 100 lx (0 mV) [00:02:05.129,913] sensors: prst_sensors_read_all: Temp: 25.407486 oC [00:02:05.129,943] sensors: prst_sensors_read_all: Humi: 43 % [00:02:05.129,974] sensors: prst_sensors_read_all: --------------------------------------------------

looks quite clean!

rbaron commented 3 months ago

I can't see anything immediately wrong from the logs or the pics. I compared the components orientations with a board I have here and they look correct. I assume you already did some of these steps, but I would:

  1. Make sure no loggers/debuggers are attached during the tests. Turning RTT logging on for examples shoots up current consumption to 10 mA
  2. Remove battery
  3. Check there are no shorts between VCC and GND. I see you added a bunch of stitching vias, maybe one shorted the batt pad in the back (there would be under the module)?
  4. Check there are no shorts between every pair of pins in the nRF module

From your first post, I understood you ordered two batches, and you got 2 working from the first one, correct? Does it mean these 2 display correct low power consumption?

Thomas-514 commented 1 month ago

Hi, I recently ordered 10 boards, mostly assembled, from JLC PCB. All of them are working, one of them shows the same problem you are having though.

I have a source measurement unit, hopefully I'l find the time to do some current consumption measurements and give you an update soon.

As it is my first post a big thank you to rbaron for the awesome project!

Bildschirmfoto 2024-08-07 um 18 41 13
Thomas-514 commented 1 month ago

Here are a few measurements for the sensor which gave me the weird battery voltage.

Current consumption over 10 minutes, including two send periods of 1 s each. Sampled at 1000 Hz.

Bildschirmfoto 2024-08-07 um 19 34 55

Current consumption during startup. Including 1 send period. Sampled at 100kHz.

Bildschirmfoto 2024-08-07 um 19 37 17

And lastly the sleep current consumption. Sampled at 100kHz.

Bildschirmfoto 2024-08-07 um 19 39 13

In my opinion they all seem pretty good?

rbaron commented 1 month ago

@Thomas-514 I'm glad to hear you managed to build some boards. Yes, I'd say those values look okay. Pretty close to nominal.

If you punch those numbers in a battery life calculator, you'll get well over a year. Assuming 150 mAh batt, a conservative 10uA average current, I get 20 months.

Years ago I had something similar happen, where a single board discharged super fast. It turned out to be water shorting the battery clip.

Thomas-514 commented 1 month ago

@rbaron unfortunately by now a lot of the boards I built show the same behavior.

They seem normal under operation with the source measurement unit so my culprit is the battery and it's internal resistance.

I assume the battery voltage is read roughly at the same time when data is transmitted. If we assume a high internal resistance of 30 Ohms (I read it typically is in the order of 15-20 Ohm for CR2032) and a current of 16 mA (during transmission) this would result in a voltage drop of 480 mV, maybe enough to shut down the MCU if the battery voltage is already a bit low? I don't see any larger decoupling caps, could those help?

A clue might be the weird recovery of battery voltage once the MCU shuts down. It somehow recovers again and manages to send data again (for a few hours to days) before it dies completely.

The batteries of the devices which are no longer working still show an OCV of 2.82 V.

I'll experiment some more.

Thomas-514 commented 1 month ago

If I try to use one of the "faulty" boards with the "faulty" batteries the LED blinks from time to time, I think the board is in a reboot cycle. The voltage over the battery during this looks like this:

battVoltage

If we match that up with the current consumption during reboot (earlier post, here again for reference):

355924889-85f999fc-21b7-478a-9649-4513ae75e322

we can see that the battery voltage drops below 2 V when it tries to send data and the reboot cycle starts again.

The voltage drop of more than 1 V for the 16 mA seems very large, no idea why that is. I don't have a load (or various resistors) to test these small currents to get an understanding of the iR of the batteries. (But this will be my personal next project).

Edit: Soldering a 100uF electrolytic cap (what I had on hand) between Vcc and GND fixed the problem. You could probably get by with a lot smaller capacitance. The same reboot graph with the cap is shown below.

Unbenannt

rbaron commented 1 month ago

Indeed the voltage drop seems rather high during the BLE TX. I'd expect this sort of voltage drop after a year of use -- not with a new battery. I just checked the usual current consumption for the zigbee sample, and it goes even higher up to 20mA at peak. It seems to still be okay for me and for other users. Out of curiosity, have you tried different batteries batches/manufactures? Have you observed a situation in which it does not occur with some of your boards? What if you swap the batteries between working/faulty boards? Otherwise I don't have a lot of ideas :(

Larger caps across the power rails can definitely help, but I'd be curious to measure the leakage current it introduces, if appreciable. For the current design, the module we use just uses the caps as suggested by Nordic (datasheet).

There's one extra thing I'd be curious about: we currently set the TX power to the maximum +8dB here. These are the sharp peaks we see in the current chart. You could you try lowering it to CONFIG_BT_CTLR_TX_PWR_0=y and/or to CONFIG_BT_CTLR_TX_PWR_PLUS_4=y and checking the peak current. Theoretically, this could work for squeezing the last few Coulombs of charge out of a discharged batt, at the expense of range.

Thomas-514 commented 1 month ago

I have 2 boards that work fine and if I remember correctly I had two batteries to start with and bought some more. It could well be that all the non-working boards are powered by the newer batteries (Energizer brand), which are probably from the same batch.

I can swap the two "good batteries" into other boards and they will work fine.

I can try to measure the leakage current tomorrow. Electrolytic capacitors are generally known to be worse than MLCC. It would be interesting to test different types if I can get around to it. I can also try lowering the TX power while I'm at it.

In conclusion, I strongly suspect that I was unlucky with the batteries and got a bad batch. I'll buy some other ones and see what happens.