jgyates / genmon

Generac (and other models) Generator Monitoring using a Raspberry Pi and WiFi
GNU General Public License v2.0
379 stars 76 forks source link

High Python CPU usage after initial intstall #371

Closed lakee911 closed 4 years ago

lakee911 commented 4 years ago

Hello,

Brand new install here. Running this on a Pi Zero 1.3 w/ Raspian Buster and boot into CLI (not GUI). Everything was fine until ran genmonmaint. sh to get this installed. After rebooting, I'm seeing a lot of CPU usage Python is using 25% to 30% CPU. SSH into the CLI is even sluggish and the genmon webpage takes 5min or so to load if it doesn't time out. I've tried the "low" option too.

I'm wondering if my choice of Raspian Buster was a poor one. Perhaps I should have gone with something a bit slimmer? If so, is there any way to strip it down or am I better off starting anew? I'm new to the Pi and my Linux skills are quite rusty these days.

Thanks.

liltux commented 4 years ago

@lakee911 I have genmon running on a PiZeroW with buster lite. python is using 30-32% using top command. My webpage is loading normal. I would suggest using the lite. But, You can try to raspi-config: Boot Options boot to command line.
That way hopefully the graphic load is not loaded at boot.

jgyates commented 4 years ago

I use the lite version however with the pi zero you will see higher CPU utilization. I typically use the raspi-confg program to allocate the minimum memory to the graphics processing when using the lite version. Also, on the settings page there is an option called "Optimize for slower CPUs". This will slow the polling thread of the serial data returning from the controller which will reduce the CPU utilization for Pi Zero devices.

@liltux suggestion about the boot option is right on the mark if you are not using the lite version. This way you will not have to re-install.

lakee911 commented 4 years ago

Hey, thanks for the quick responses!

Good idea on the boot option, but I had already set it to boot to command line interface (CLI). Surprisingly, that didn't seem to make much of a difference. From a quick Google search that should have pretty much made it "lite," but it would need a little more tweaking. Nothing that significant more to be had, from what I understood, though.

I did go in and reallocate the GPU memory as you suggested. I wasn't sure if 16 was the minimum, but that's what I went with. Rebooting made the CLI more snappy (acess over SSH), but the web page is still slow. I might zero it out or go with like 2...

The webpage is loading, but it's remaining with a white background and is just spinning. I can view the source just fine and the title shows up. I'm not quite sure how I can tell what it's actually doing.

I enabled the optimization for slow in genmon.conf and rebooted. Again, a little better. And there is no improvement to the webpage. In fact, I tried serving up a page in Chrome and it just spun. I tried in IE and very quickly got a "Can’t reach this page" error (INET_E_RESOURCE_NOT_FOUND). I switched back to my console (assumed) session in PuTTY and it's slower than a dog.

I think something else must be up ... can't just be the OS.

Thoughts?

jgyates commented 4 years ago

You may have a serial communication problem. The web page expects some basic info about the underlying controller to determine what controls to populate. If serial comms are not functioning then this can take a bit to revert to the default values and allow the web page to populate. You can log in via ssh and use the ClientInterface.py program to get some base status:

 cd genmon/
 python ClientInterface.py 

Once this is loaded you can use this command to display comm status.:

 generator: monitor 

This should show something like this:

Monitor :

Generator Monitor Stats : 
    Monitor Health : OK
    Controller : Evolution, Liquid Cooled
    Run time : Generator Monitor running for 2 days, 9:04:29.
    Power log file size : 0.04 MB of 15.00 MB
    Generator Monitor Version : V1.14.05

Communication Stats : 
    Packet Count : M: 6299191, S: 6299190
    CRC Errors : 0 
    CRC Percent Errors : 0.00%
    Timeout Errors : 0
    Timeout Percent Errors : 0.00%
    Modbus Exceptions : 0
    Validation Errors : 0
    Invalid Data : 0
    Discarded Bytes : 0
    Comm Restarts : 0
    Packets Per Second : 61.32
    Average Transaction Time : 0.0319 sec

You can use this command to send me a copy of your log files (assuming you have outbound email setup:

 generator: sendlogfiles

Once I have your log files I can look and see if something is wrong.

If it is a serial comm issue you should be able to start genmon, wait for a few minutes, then start the web interface with no delay. If you have a serial issue the delay would only apply for the first few minutes that genmon is started. Basically genmon has to attempt to read each controller register once before it is full initialized. If your serial comms are failing then each attempt to read a register would need to timeout (about 3 seconds for each register)

lakee911 commented 4 years ago

I don't think it's a serial issue. Here's what I get from ClientInterface.py

pi@genmon:~/genmon $ python ClientInterface.py OK : Auto, Off - Ready

generator: monitor

Monitor :

Generator Monitor Stats :
    Monitor Health : OK
    Controller : Evolution, Air Cooled
    Run time : Generator Monitor running for 1:36:22.
    Power log file size : 0.00 MB of 15.00 MB
    Generator Monitor Version : V1.14.05

Communication Stats :
    Packet Count : M: 144612, S: 144611
    CRC Errors : 0
    CRC Percent Errors : 0.00%
    Timeout Errors : 0
    Timeout Percent Errors : 0.00%
    Modbus Exceptions : 0
    Validation Errors : 0
    Invalid Data : 0
    Discarded Bytes : 0
    Comm Restarts : 0
    Packets Per Second : 50.02
    Average Transaction Time : 0.0377 sec

Platform Stats :
    CPU Temperature : 102.20 F
    Pi Model : Raspberry Pi Zero Rev 1.3
    Pi CPU Frequency Throttling : OK
    Pi ARM Frequency Cap : OK
    Pi Undervoltage : OK
    CPU Utilization : 33.73%
    OS Name : Raspbian GNU/Linux
    OS Version : 10 (buster)
    System Uptime : 1:37:27
    Network Interface Used : wlxe84e06456526
    WLAN Signal Level : -98 dBm
    WLAN Signal Quality : 12/70
    WLAN Signal Noise : -256 dBm
    System Time : Friday April 24, 2020 18:48:45

I do NOT have outbound email setup. How else can I get them to you?

Thanks.

jgyates commented 4 years ago

The files of interest are:

/var/log/genmon.log /var/log/genserv.log /var/log/genloader.log /var/log/myserial.log /var/log/mymodbus.log

You can attached these files to this thread. I am guessing you don't need the last two files since your serial comms look ok from your last post.

genloder.log should let us know if any libraries are missing. genmon.log is for the main genmon.py program. genserv.log is for the web server flask application.

jgyates commented 4 years ago

Also, what OS and browser are you using. Have you tried any other broswers?

lakee911 commented 4 years ago

Nothing jumped out at me in these, but you obviously know them better than I do. Please see attached. genmon logs 20200424.zip

I've tried Chrome, IE, and Chrome on my Android phone.

jgyates commented 4 years ago

Another thing you could try is to use the Chrome developer tools (Settings -> More Tools -> Developer Tools), then select the Network view in developer tools. This will show what is not responding on web interface. It should look something like this:

Screen Shot 2020-04-24 at 9 21 24 PM
jgyates commented 4 years ago

In the Name column I am looking for "start_info_json". this should be one of the first few items in the Name column (you can scroll to the top). If that call never happens or and error occurs for some reason that would cause the hang up, but I am just not sure what would be causing that to occur.

lakee911 commented 4 years ago

That's exactly what I needed. We're never getting past libraries.min.js or libraries.min.css. image

jgyates commented 4 years ago

The first ajax command sent after libaries.min.* are loaded are start_info_json.

You can manually enter this with this url:

http://192.168.21.124:8000/cmd/start_info_json

this should return a bunch of JSON data, should look someting like this:

{"nominalKW": "48", "RemoteTransfer": true, "sitename": "SiteName", "RemoteCommands": true, "Controller": "Evolution, Liquid Cooled", "version": "V1.14.05", "UtilityVoltage": true, "AckAlarms": false, "write_access": true, "ResetAlarms": true, "tiles": [{"divisions": 6, "title": "Battery Voltage", "colorzones": [{"strokeStyle": "#F03E3E", "max": 11.5, "min": 0}, {"strokeStyle": "#FFDD00", "max": 12.5, "min": 11.5}, {"strokeStyle": "#30B32D", "max": 15, "min": 12.5}, {"strokeStyle": "#FFDD00", "max": 15.5, "min": 15}, {"strokeStyle": "#F03E3E", "max": 16, "min": 15.5}], "labels": [0, 4, 8, 12, 16], "maximum": 16, "subtype": "batteryvolts", "subdivisions": 10, "minimum": 0, "default-size": 2, "units": "V", "type": "gauge"}, {"divisions": 10, "title": "Utility Voltage", "colorzones": [{"strokeStyle": "#F03E3E", "max": 216.0, "min": 0}, {"strokeStyle": "#FFDD00", "max": 228.0, "min": 216.0}, {"strokeStyle": "#30B32D", "max": 252.0, "min": 228.0}, {"strokeStyle": "#FFDD00", "max": 264.0, "min": 252.0}, {"strokeStyle": "#F03E3E", "max": 280, "min": 264.0}], "labels": [0, 50, 95, 145, 190, 240, 280], "maximum": 280, "subtype": "linevolts", "subdivisions": 0, "minimum": 0, "default-size": 2, "units": "V", "type": "gauge"}, {"divisions": 10, "title": "Output Voltage", "colorzones": [{"strokeStyle": "#F03E3E", "max": 216.0, "min": 0}, {"strokeStyle": "#FFDD00", "max": 228.0, "min": 216.0}, {"strokeStyle": "#30B32D", "max": 252.0, "min": 228.0}, {"strokeStyle": "#FFDD00", "max": 264.0, "min": 252.0}, {"strokeStyle": "#F03E3E", "max": 280, "min": 264.0}], "labels": [0, 50, 95, 145, 190, 240, 280], "maximum": 280, "subtype": "linevolts", "subdivisions": 0, "minimum": 0, "default-size": 2, "units": "V", "type": "gauge"}, {"divisions": 7, "title": "Frequency", "colorzones": [{"strokeStyle": "#F03E3E", "max": 54, "min": 0}, {"strokeStyle": "#FFDD00", "max": 57, "min": 54}, {"strokeStyle": "#30B32D", "max": 63, "min": 57}, {"strokeStyle": "#FFDD00", "max": 66, "min": 63}, {"strokeStyle": "#F03E3E", "max": 70, "min": 66}], "labels": [10, 20, 30, 40, 50, 60, 70], "maximum": 70, "subtype": "frequency", "subdivisions": 10, "minimum": 0, "default-size": 2, "units": "Hz", "type": "gauge"}, {"divisions": 4, "title": "RPM", "colorzones": [{"strokeStyle": "#F03E3E", "max": 1725, "min": 0}, {"strokeStyle": "#FFDD00", "max": 1750, "min": 1725}, {"strokeStyle": "#30B32D", "max": 1850, "min": 1750}, {"strokeStyle": "#FFDD00", "max": 1875, "min": 1850}, {"strokeStyle": "#F03E3E", "max": 1890, "min": 1875}], "labels": [0, 450, 900, 1350, 1800], "maximum": 1890, "subtype": "rpm", "subdivisions": 10, "minimum": 0, "default-size": 2, "units": "", "type": "gauge"}, {"divisions": 10, "title": "Fuel", "colorzones": [{"strokeStyle": "#F03E3E", "max": 10, "min": 0}, {"strokeStyle": "#FFDD00", "max": 25, "min": 10}, {"strokeStyle": "#30B32D", "max": 100, "min": 25}], "labels": [0, 10, 20, 30, 40, 50, 60, 70, 80, 90], "maximum": 100, "subtype": "fuel", "subdivisions": 10, "minimum": 0, "default-size": 2, "units": "%", "type": "gauge"}, {"divisions": 12, "title": "Power Output", "colorzones": [{"strokeStyle": "#30B32D", "max": 38, "min": 0}, {"strokeStyle": "#FFDD00", "max": 45, "min": 38}, {"strokeStyle": "#F03E3E", "max": 57, "min": 45}], "labels": [0, 10, 20, 30, 40, 50, 57], "maximum": 57, "subtype": "power", "subdivisions": 5, "minimum": 0, "default-size": 2, "units": "kW", "type": "gauge"}, {"divisions": null, "title": "kW Output", "colorzones": null, "labels": null, "maximum": 57.0, "subtype": "powergraph", "subdivisions": null, "minimum": 0, "default-size": 2, "units": "", "type": "graph"}], "RemoteButtons": true, "nominalfrequency": "60", "ExerciseControls": true, "pages": {"status": true, "about": true, "logs": true, "notifications": true, "outage": true, "maint": true, "monitor": true, "maintlog": true, "addons": true, "settings": true}, "WriteQuietMode": true, "PowerGraph": true, "FuelSensor": true, "nominalRPM": "1800", "FuelCalculation": true, "FuelConsumption": true, "model": "RD04834ADAE", "fueltype": "Diesel", "NominalBatteryVolts": "12"}

If you can issue this command then the question becomes, why is the javascript not executing once it has been transferred to your browser. Did you enable any settings manually in the /genmon.conf file that control the web interface? If you did then it is possible that you have a setting that is preventing the loading. If you did not then we need to see if you have any security software (router, or personal security product software like antivirus) blocking things.

lakee911 commented 4 years ago

That executes just fine and it dumps out proper and quick data.

I'm on my work machine which could have some security software, but it also doesn't work on my personal computer or phone. The only common element would be my network. And, the only common network element is my Netgear Nighthawk X4S R7800 router.

I checked the logs and nothing in there out of the ordinary. The logs aren't great, though. In fact, they suck. Ha. Bunch of DHCP requests from all the up and down w/ troubleshooting. No configured parental controls, blocked sites, blocked services, any of that.

What next?

jgyates commented 4 years ago

Do you get the same results when you use this URL?

 http://192.168.21.124:8000/index_verbose.html

The difference is that this URL will not use compressed javascript and CSS files.

You can also try this URL for a diagnostic:

   http://192.168.21.124:8000/internal.html

This will show the registers of the controller as they change (assuming this page pulls up correctly, it uses javascrpt also.

Also, are you on the same subnet (i.e. are you going thru a VPN accessing this remotely in any way)?

The Netgear equipment is not know for having a lot of intrusive security settings (like ASUS).

lakee911 commented 4 years ago

None of those links pull up anything. Just sits like the main interface.

Recently had an occurrence where it didn't time out but took at least 30min and wasn't fully loaded/functional. So, it is getting there s l o w l y. Still not sure if that's better than blocked entirely...

No VPN. Same subnet.

Oh, and the only change I made to the conf file was for that slow setting.

jgyates commented 4 years ago

If the internal.html page does not pull anything up then you do have an issue with javascript not working. I do not know if the issue is the pi or your network. What speed of SD card are you using and is the SD card old? If you are using a relatively new card and it is of decent speed then about the only thing I can suggest is to try reinstalling. It appears that everything is working except the serving of web pages.

One other question. If you ssh into the system and type this:

 python -V

what version number is returned? I would expect it to be Python 2.7.13

jgyates commented 4 years ago

Just to recap:

-Looks like your serial comms are working. -Looks like the web server is serving pages and accepting ajax commands -The javascript served by the web server are not executing for some reason. This could be because they are received corrupted or something on your network is blocking. The Chrome developer tools does appear to show the javascript was received by your browser.

One other question: what do you see when you are waiting on the web page to load when trying to load the main page? Do you just see a blank screen or does it say "Loading Generator Monitor"?

lakee911 commented 4 years ago

Thank you for your dedication and attention to helping me solve this issue!

When I check the Python version, it is returning 2.7.16. Little more recent than your expectation.

For some reason I'm getting less timeouts on the web page and that's probably from trying to open a single one at a time. Second page often times out quickly. With only one try, it's now taking about 30min (?) to get past the point where I said it was only loading libraries.min.js or libraries.min.css.

Here is a screen cap of what's shown on the same webpage that I was trying to load last night. I just left it up from then. image

See all that red (canceled)? Is that an issue?

Upon loading, I get a white screen w/ title in the titlebar and then it does says Loading Generator Monitor after quite some time. Eventually it loads. Like I said earlier, it appears to be just REALLY SLOW.

The SD card is new and this is what I purchased: https://www.amazon.com/gp/product/B0834NN9D8/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1 Ratings are good and it was cheap. I may have an old one around here somewhere, but it might be (too) small.

Think I should turn the GUI back on and use VNC to load up Chromium or something right on the pi? That would entirely avoid my network being an issue.

If I have to, I think the easiest way to reimage/reinstall would likely be to pull out my SD card, pop it into my laptop, reimage and set up for headless install, put it back in the pi in the generator and set it up remotely. I'm not entirely convinced that anything different will come about, but I can try it. Maybe I won't use full-blown buster too...

jgyates commented 4 years ago

After taking a second look at this I believe you problem is your wifi signal strength. In your earlier post your signal strength was:

   WLAN Signal Level : -98 dBm
   WLAN Signal Quality : 12/70
   WLAN Signal Noise : -256 dBm

This page has more detail:

https://support.randomsolutions.nl/827069-Best-dBm-Values-for-Wifi

But if you have -98dBM regularly then this is likely your issue. I would look at running either Ethernet directly to the pi or a serial cable to the pi to allow the pi to be closer. You could also look at where you mounted the pi. If it is inside your generator enclosure then you may want to look at mounting it outside of the generator enclosure. Some folks have used external antennas with pi zeros.

https://www.briandorey.com/post/raspberry-pi-zero-w-external-antenna-mod

Others have used a USB based Wifi with an antenna.

Your signal is quite low so the sluggishness you are seeing is likely not caused by the 30% CPU utilization but bad signal strength.

liltux commented 4 years ago

I would have to agree with @jgyates, one thing you could try if you haven’t, plug the pi into a wall adapter on your desk/or table inside and connect to the device. At that point, inside and close to the WiFi router, does it improve? That would agree with the signal issue if it did improve. Even disconnected from the generator, genmon still runs just no data is displayed.

lakee911 commented 4 years ago

I was in disbelief, but I think that is an issue. I did a speed test and it measured 0.45MBps and 0.18Mbps. I'm not sure what's normal for a Pi Zeo, but that seems low to me.

My unit is located inside the generator, however, I've already installed and external wip antenna. Funny thing is, when I remove it the signal level doesn't change. I watched the signal strength from my phone conneted to the pi so I could move it around and see what improvements I could get. Quality went up a max of 2 points. Signal strength went one dB lower and up to 26dB higher.

Here's what I've got:

W9015M 1837-1032-ND CBL ASSY SMA-UMCC JACK-I-PEX 15" https://www.digikey.com/product-detail/en/W9015M/1837-1032-ND/2267912

ANT-315-CW-HW-SMA ANT-315-CW-HW-SMA-ND ANTENNA 315MHZ 1/4WAVE WHIP SMA https://www.digikey.com/product-detail/en/ANT-315-CW-HW-SMA/ANT-315-CW-HW-SMA-ND/5592330

EDUP USB WiFi Adapter for PC 150Mbps Wireless Network Adapter for Desktop Nano Size WiFi Dongle Compatible with Windows 10/7/8.1/XP/Vista/Mac OS https://www.amazon.com/gp/product/B00KV9TQXM/ref=ppx_yo_dt_b_search_asin_image?ie=UTF8&th=1

Access Point and Pi are only 50 ft apart. Shouldn't be this bad... Any thoughts?

lakee911 commented 4 years ago

Good idea. I'll bring it inside and see what happens! Thanks.

lakee911 commented 4 years ago

Holy cow. Bringing inside, and actually probably not that much further than it was when it was outside, made the difference. Web page loads right up. Link quality is 62/70 and Signal at -48dBm. Now I just need to figure out why my signal quality is in the toilet.

Thank you so much. Sorry to burden you with an issue that was entirely mine all along.

jgyates commented 4 years ago

No problem. I am glad you have it narrowed down.

Wifi signal quality varies quite a bit depending on the type material it has to go through or around, for example brick degrades the signal more than wood.

jgyates commented 4 years ago

I am going to close this issue. Feel free to post to this thread if you have other questions on this topic.

lakee911 commented 4 years ago

Hey, thanks so much for the help. I now have the Pi hardwired and it works like a charm.

Somehow in the process, the 5V regulator on my controller burned out. There is nothing that I can think of that leads me to believe that this was my fault. I was (and still am) powering the Pi from the controller. I've checked and rechecked everything, pulled and tugged on things, wiggled things. I have no idea how it happened. I went out one day after it failed its weekly test to find the controller dead and the fuses and battery good. It always worked before and after I was tinkering w/ the Pi so maybe it failed on its own...

I opened up the controller and found evidence of ants, but they were not to the amount that should have caused trouble. The 5V regulator was toasted ... deformed chip and small black spot on inside of housing. There was a slight burnt smell if I put my nose to it. Originally, my plan was to source a new part, but I elected to simply power it externally w/ a 12V to 5V regulator and a couple of extra fuses (one on the 12V and one on the 5V). It's working now and I don't want to risk further damage by desoldering the chip and resoldering a new one.

Interestingly enough, if I pull the 12V 7.5 fuse, the controller's processor stays up to tell me that its a fuse blown/out/problem. Normally that fuse takes the controller down and you lose the display so it cannot report itself as being out. Bit weird design feature/bug of the original controller.

gzebrick commented 4 years ago

I understand. At one point while I was wiring I accidentally touched a pin to GND off the board and the generac LCD display on my controller went out. The generator was still talking Modbus (once I plugged it all in) and there was still a green light, but no LCD. Pulling the fuse didn't help. Unplugging everything I could also didn't help. Eventually, with everything unplugged and the fuse pulled, I disconnected the battery, waiting a minute, and put it all back together. The generator Gods were smiling on me and the LCD has been back. It was just a momentary touch on a wire that I should have been more careful dealing with, but I think I lucked out. I took your advice and am using that 12v-5v power cable to power my Pi, and soldered up a proper molex connector to plug into the generac. It's been working great. Thanks.

On Wed, May 6, 2020 at 9:58 AM lakee911 notifications@github.com wrote:

Hey, thanks so much for the help. I now have the Pi hardwired and it works like a charm.

Somehow in the process, the 5V regulator on my controller burned out. There is nothing that I can think of that leads me to believe that this was my fault. I was (and still am) powering the Pi from the controller. I've checked and rechecked everything, pulled and tugged on things, wiggled things. I have no idea how it happened. I went out one day after it failed its weekly test to find the controller dead and the fuses and battery good. It always worked before and after I was tinkering w/ the Pi so maybe it failed on its own...

I opened up the controller and found evidence of ants, but they were not to the amount that should have caused trouble. The 5V regulator was toasted ... deformed chip and small black spot on inside of housing. There was a slight burnt smell if I put my nose to it. Originally, my plan was to source a new part, but I elected to simply power it externally w/ a 12V to 5V regulator and a couple of extra fuses (one on the 12V and one on the 5V). It's working now and I don't want to risk further damage by desoldering the chip and resoldering a new one.

Interestingly enough, if I pull the 12V 7.5 fuse, the controller's processor stays up to tell me that its a fuse blown/out/problem. Normally that fuse takes the controller down and you lose the display so it cannot report itself as being out. Bit weird design feature/bug of the original controller.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jgyates/genmon/issues/371#issuecomment-624700508, or unsubscribe https://github.com/notifications/unsubscribe-auth/APDA4DQTCQWLPUJMIMOMSU3RQF3LDANCNFSM4MQKYIJA .

lakee911 commented 4 years ago

@gzebrick: Be careful if you have soldered connections. These are often crimped/plugged due to vibration. Terminals on the plugged connections are often crimped too. Vibration can cause the soldered joints to crack. I really struggle with this myself and I prefer solder and heatshrink, but I've learned the hard way over the years.

Sounds like you did indeed get lucky with your controller. That's interesting that your display didn't immediately come back. Maybe it has a tiny self-resetting circuit breaker onboard or something. There are reports on some (older?) controllers about displays failing and there are reports of folks replacing only the display. It's powered from the same 5V supply (regulator) as the controller's microprocessor and LEDs.