jhuckaby / performa-satellite

Remote data collector for Performa.
Other
15 stars 2 forks source link

Can`t take snapshots from high loaded servers #2

Open asyslinux opened 5 years ago

asyslinux commented 5 years ago

Hello.

I have a trouble, Performa can`t take snapshots (Take Snapshot, Watch Mode, Alert Based) in all cases from several high loaded servers. Servers with over 8-10k established connections.

I look at Performa server WebServer.log and:

1.2.3.4 = high-loaded server.

[1562052197.143][2019-07-02 03:23:17][monitoring][44874][WebServer][error][HPE_INVALID_EOF_STATE][Socket error: cs28581: Parse Error][{"ip":"1.2.3.4"}] [1562052200.147][2019-07-02 03:23:20][monitoringw][44874][WebServer][error][HPE_INVALID_EOF_STATE][Socket error: cs28582: Parse Error][{"ip":"1.2.3.4"}] [1562052202.467][2019-07-02 03:23:22][monitoring][44874][WebServer][error][HPE_INVALID_EOF_STATE][Socket error: cs28561: Parse Error][{"ip":"1.2.3.4"}] [1562052203.149][2019-07-02 03:23:23][monitoring][44874][WebServer][error][HPE_INVALID_EOF_STATE][Socket error: cs28583: Parse Error][{"ip":"1.2.3.4"}]

I want give a dump with raw data, but I do not know how to get snapshot raw JSON data from performa satellite on high loaded server for analyze what`s wrong.

Please advice me. Thank You.

jhuckaby commented 5 years ago

Performa Satellite requires an available TCP socket on each server (which is a kernel filehandle in Linux), and an available local port, because it sends metrics every minute using an HTTP request (TCP connection). So if your server is completely overloaded, your kernel has either no available socket connections to spare (i.e. maximum open filehandles), and/or it has no available local ports to assign. Either of these would cause a problem like this.

Unfortunately there is nothing I can really do or fix here. You'll probably have to make some changes on your servers, specifically in the kernel, if you have run them totally maxed out like this. Performa Satellite will need an available filehandle and local port.

I recommend checking out these two articles:

Good luck!

asyslinux commented 5 years ago

File descriptors and sockets have. Satellite send information to Performa server Monitored servers not overloaded. Parse error on the Performa server log.

For example: high loaded server have several IP address`s. Performa satellite using other IP for send metrics, where no have much incoming connections. Load average 10-20 on server.

asyslinux commented 5 years ago

image

asyslinux commented 5 years ago

May be have socket timeout for receive big snapshot data from Satellite on Performa server? Rarely I can look snapshots of this servers.

And I look Performa not correct detecting physical cores on servers. But this is other issue.