nielsonm236 / NetMod-ServerApp

Reprogramming the Web_Relay_Con V2.0 HW-584 Network Module
71 stars 21 forks source link

Request on reopening Intermittend reboot Netmodule #169

Closed sjorsBe closed 1 month ago

sjorsBe commented 1 year ago

Hi All,

as an addition to the now closed issue of Intermittend reboot Netmodule the following:

The module kept on rebooting with an interval between 8 and 16 hours There was no link with activity of the module itself. Tests were done with

So I changed the entire module but it had the same result with the other ones! So I changed the power supply:

So I hooked up a temp sensor to the small heatsink and there seems to be a relation whenever the heatsink temp reaches 30 degrees C. This seems to cause the modules to reboot.

So I now ordered a pure copper heatsink ( 10 x 16 * 11 mm) and will glue it with proper heatconducting paste to the chip. See if that makes a difference.

I wonder what is your experience with a proper build in module in a not cold environment?

Please let me know.

I 'll keep you posted on the experiments.

kr

Sjors

nielsonm236 commented 1 year ago

I have 9 modules running constantly just for test. Room temperature runs from about 21C (typically at night) to 28C (daytime).The room has other equipment running in it so it stays warmer than the house at night (our night thermostat is set to 17C.

Which chip is getting hot? The STM8S processors on my are just a little warm. The ENC28J60 chips get "hot", but even without a heatsink I can keep my finger on them for a minute or more. Covering a bare ENC28J60 chip with my finger causes them to warm fairly quickly due to no air exposure. While the manufacturer didn't provide instructions I'm fairly sure the heatsink they provide with the boards is supposed to be on the ENC28J60.

I will see if I can warm up the test matrix somehow. Here's a photo of how I have the test boards set up: Maybe just hang a cover over them for self heating and monitor the temperature sensors on them.

IMG_4354

As FYI there was one other person emailing me about resets a couple of month ago. I will have to go back and find those emails. He said his device ran fine for over a year, then started having problems. At the time I questioned temperature, but he said the location was room temperature. No actual temperature measured, so "room temperature" is subjective.

Mike

sjorsBe commented 1 year ago

Hi Mike,

Thanks for your answer. But it really is a bit strange The module is still working but i do keep the temp low! The module temp is now 19,9 degree C ( the room temp is 09.6 degrees C – it isnt spring yet here ☺ ) I’ll keep on testing while i switch on the heating – lets see what happens …

Kind regards

Sjors

Van: Michael Nielson @.> Verzonden: woensdag 5 april 2023 14:17 Aan: nielsonm236/NetMod-ServerApp @.> CC: Sjors Beens @.>; Author @.> Onderwerp: Re: [nielsonm236/NetMod-ServerApp] Request on reopening Intermittend reboot Netmodule (Issue #169)

I have 9 modules running constantly just for test. Room temperature runs from about 21C (typically at night) to 28C (daytime).The room has other equipment running in it so it stays warmer than the house at night (our night thermostat is set to 17C.

Which chip is getting hot? The STM8S processors on my are just a little warm. The ENC28J60 chips get "hot", but even without a heatsink I can keep my finger on them for a minute or more. Covering a bare ENC28J60 chip with my finger causes them to warm fairly quickly due to no air exposure. While the manufacturer didn't provide instructions I'm fairly sure the heatsink they provide with the boards is supposed to be on the ENC28J60.

I will see if I can warm up the test matrix somehow. Here's a photo of how I have the test boards set up: Maybe just hang a cover over them for self heating and monitor the temperature sensors on them.

[IMG_4354]https://user-images.githubusercontent.com/63531544/230076592-57a7d016-aaf6-4b79-803e-64e181d65cc0.jpg

As FYI there was one other person emailing me about resets a couple of month ago. I will have to go back and find those emails. He said his device ran fine for over a year, then started having problems. At the time I questioned temperature, but he said the location was room temperature. No actual temperature measured, so "room temperature" is subjective.

Mike

— Reply to this email directly, view it on GitHubhttps://github.com/nielsonm236/NetMod-ServerApp/issues/169#issuecomment-1497391645, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARXPVV2YPSSX2WDJ2DU65CLW7VPEVANCNFSM6AAAAAAWT4HYDM. You are receiving this because you authored the thread.Message ID: @.**@.>>

sjorsBe commented 1 year ago

Hi Mike,

I have also connected 4 DS18B20 ’s to the module, perhaps the temp sensors trigger another part of the software causing it to reboot or block

Kr

Sjors

Van: Michael Nielson @.> Verzonden: woensdag 5 april 2023 14:17 Aan: nielsonm236/NetMod-ServerApp @.> CC: Sjors Beens @.>; Author @.> Onderwerp: Re: [nielsonm236/NetMod-ServerApp] Request on reopening Intermittend reboot Netmodule (Issue #169)

I have 9 modules running constantly just for test. Room temperature runs from about 21C (typically at night) to 28C (daytime).The room has other equipment running in it so it stays warmer than the house at night (our night thermostat is set to 17C.

Which chip is getting hot? The STM8S processors on my are just a little warm. The ENC28J60 chips get "hot", but even without a heatsink I can keep my finger on them for a minute or more. Covering a bare ENC28J60 chip with my finger causes them to warm fairly quickly due to no air exposure. While the manufacturer didn't provide instructions I'm fairly sure the heatsink they provide with the boards is supposed to be on the ENC28J60.

I will see if I can warm up the test matrix somehow. Here's a photo of how I have the test boards set up: Maybe just hang a cover over them for self heating and monitor the temperature sensors on them.

[IMG_4354]https://user-images.githubusercontent.com/63531544/230076592-57a7d016-aaf6-4b79-803e-64e181d65cc0.jpg

As FYI there was one other person emailing me about resets a couple of month ago. I will have to go back and find those emails. He said his device ran fine for over a year, then started having problems. At the time I questioned temperature, but he said the location was room temperature. No actual temperature measured, so "room temperature" is subjective.

Mike

— Reply to this email directly, view it on GitHubhttps://github.com/nielsonm236/NetMod-ServerApp/issues/169#issuecomment-1497391645, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARXPVV2YPSSX2WDJ2DU65CLW7VPEVANCNFSM6AAAAAAWT4HYDM. You are receiving this because you authored the thread.Message ID: @.**@.>>

nielsonm236 commented 1 year ago

I have one DS18B20 on each of 5 modules. It is easy for me to add up to 5 on one of the modules. I doubt this is the problem, but it is good to try any possibility we think of. What pullup resistance do you have on the DS18B20 data line?

Another thing to consider: I know your relay specification is 2mA input. But do you have a pull up on those pins too? What value? When the pin is high, what voltage is on the pin? What I'm thinking about is if there is too much current being drawn when pulling the pin low. OR if there is not enough pull-up on the pin then the relay might pull the pin to some voltage that causes the pin to draw current. This might heat up the STM8S.

Mike

nielsonm236 commented 1 year ago

I was just re-reading info provided in issue 162. I don't think the pins attached to the solid state relays are the problem. At least by the info you provided it should not be a problem. Are you able to measure the total current drawn through the 5V to the HW-548? That might tell us if there is excess current draw somewhere on the board ... and if too much current we know to look for the source of that problem. I can measure a couple of my test boards to see what I have here.

jmcvieira1 commented 1 year ago

Three hours ago when I received notification of this problem I looked at my test bench and wrapped a rag around the networkmodule, after about three hours it still works without reboot. the STM8S temp is 38,7ºC and the lan controller is 51,9ºC I have running one DS18B20 and one PCF SPA50252 SPA50253 SPA50254 SPA50255

jmcvieira1 commented 1 year ago

Sjors to be really sure that the temperature is the origin of the problem, you can use a hair dryer and heat the board to see if it reboots.

nielsonm236 commented 1 year ago

I only got my boards up to about 27C last night, I need to figure out another way to warm them up. Will work on that later today.

My mind is still stuck on power as the source of the problem since I've seen that so often. I know your are sourcing 5V from a lab supply, which is a good choice for this testing.

How is the lab power supply connected to the HW-584? Directly via the 5V screw terminals? Or through some other board? If the lab supply is connected to the 5V terminals how long is the wire and the approximate gauge of the wire? If the power connects to a different board first (perhaps the relays) then goes to the HW-584 what is the gauge of the wire in that connection?

I made some mistakes with wire gauge and wire length in previous test configurations, which is why I'm asking.

Mike

sjorsBe commented 1 year ago

Hi Mike

I see a lot of questions, thanks for that Im in the middle of a hectic period of work right now, I’ll try to answer you on Sunday is that ok?

Sjors

Van: Michael Nielson @.> Verzonden: donderdag 6 april 2023 13:50 Aan: nielsonm236/NetMod-ServerApp @.> CC: Sjors Beens @.>; Author @.> Onderwerp: Re: [nielsonm236/NetMod-ServerApp] Request on reopening Intermittend reboot Netmodule (Issue #169)

I only got my boards up to about 27C last night, I need to figure out another way to warm them up. Will work on that later today.

My mind is still stuck on power as the source of the problem since I've seen that so often. I know your are sourcing 5V from a lab supply, which is a good choice for this testing.

How is the lab power supply connected to the HW-584? Directly via the 5V screw terminals? Or through some other board? If the lab supply is connected to the 5V terminals how long is the wire and the approximate gauge of the wire? If the power connects to a different board first (perhaps the relays) then goes to the HW-584 what is the gauge of the wire in that connection?

I made some mistakes with wire gauge and wire length in previous test configurations, which is why I'm asking.

Mike

— Reply to this email directly, view it on GitHubhttps://github.com/nielsonm236/NetMod-ServerApp/issues/169#issuecomment-1498939378, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARXPVV5UANSCG5FJ24NL3J3W72UVDANCNFSM6AAAAAAWT4HYDM. You are receiving this because you authored the thread.Message ID: @.**@.>>

nielsonm236 commented 1 year ago

Yes that is fine. In the meantime I will continue to pursue the heat testing.

nielsonm236 commented 1 year ago

I started a test on 9 modules where the temperature is being maintained between 35.5 and 36.5 C. The test started about 3 hours ago.

nielsonm236 commented 1 year ago

The test has run about 16 hours now. No resets. This is the recorded temperature from the DS18B20 recording the lowest temperature. The others look about the same but up to 2F higher. Capture No resets and no significant complaints recorded in the Link Error Statistics (a couple of MQTT traffic retires but that is normal). As I mentioned before I have no heatsinks or air circulation so the device temperatures are far above what is being measured by the DS18B20's. I'm taking a risk of damaging the devices but will end this test in about 6 hours. Mike

sjorsBe commented 1 year ago

H mike Stop damaging the devices. It must be coiiincedence here, I will keep searching.

Kr

Sjors

Verstuurd via prive mail.

Op 7 apr. 2023 om 14:06 heeft Michael Nielson @.***> het volgende geschreven:



The test has run about 16 hours now. No resets. This is the recorded temperature from the DS18B20 recording the lowest temperature. The others look about the same but up to 2F higher. [Capture]https://user-images.githubusercontent.com/63531544/230605414-7f53f3d0-8039-47f7-a7f1-8f7f9fadce35.PNG No resets and no significant complaints recorded in the Link Error Statistics (a couple of MQTT traffic retires but that is normal). As I mentioned before I have no heatsinks or air circulation so the device temperatures are far above what is being measured by the DS18B20's. I'm taking a risk of damaging the devices but will end this test in about 6 hours. Mike

— Reply to this email directly, view it on GitHubhttps://github.com/nielsonm236/NetMod-ServerApp/issues/169#issuecomment-1500230264, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARXPVV4VZ6NRN5YBHT466RLW777LBANCNFSM6AAAAAAWT4HYDM. You are receiving this because you authored the thread.Message ID: @.***>

nielsonm236 commented 1 year ago

I'm not sure they are being damaged, butit is a risk. :-) This test was worth it to see if there was a temperature related problem. I'm not home right now but I logged in to see if they are still running. All still OK. Mike

nielsonm236 commented 1 year ago

Ended the test at 21hr 30min. Average temperature across all DS18B20's is 99.8F / 37.6C. I will leave them running as they cool to see if that causes a problem. And correcting a mis-statement: 2 out of 9 of the ENC28J60 devices did have heatsinks. The rest did not. Mike

nielsonm236 commented 1 year ago

No problems during the cool down. Capture

nielsonm236 commented 1 year ago

@sjorsBe Any progress? Have you tried the firmware that I emailed that gives more detail about boot source in the Link Error Status? Thanks Mike

jmcvieira1 commented 1 year ago

@nielsonm236
Hi Mike, I've been busy and it's only now at the end of the week that I put the test firmware to test after 48h these are the results: Code Revision 20230405 2055 Browser UPG Link Error Statistics 31 0000135998 32 0001247103 33 0000060000 34 0000000000 35 0000000000

Code Revision 20230405 2055 MQTT UPG ... Link Error Statistics 31 0000135710 32 0000547269 33 0000060000 34 0000000000 35 0000000000

nielsonm236 commented 1 year ago

@jmcvieira1 Thanks. Line 34 indicates no reboots (expected). Will be interesting to see what Sjors sees if he can reproduce the reboot problem. The biggest problem will be that a reboot still occurs but without a count in Line 34. That would strongly indicate a power loss. It is possible for a counter to increment with a power loss if the loss is very slow (like a very slow voltage drop off with noise on the power which might cause restarts without completing boot ... and thus improper processor execution). But that is unlikely. Usually a power loss is quick and no illegal operation count occurs, mostly because the power loss detection in hardware has fairly good hysteresis.

nielsonm236 commented 1 month ago

Closed due to lack of activity.