jhuckaby / performa-satellite

Remote data collector for Performa.
Other
15 stars 2 forks source link

How to debug connection issue to performa server? #8

Open KlugFR opened 2 months ago

KlugFR commented 2 months ago

Hello.

I have a performa server with a dozen performa-satellite sending data to it. Smooth.

But it doesn't work for one installed performa-satellite.

The cron job is correctly executed each minute. Access to :5511 on the performa server is ok from the VM running performa-satellite (tested with telnet). /opt/performa/satellite.bin --debug shows relevant data.

performa-server doesn't seem to receive any data from this performa-satellite (nothing in the server logs with the hostname).

What could I try/test? Something like a "verbose" option on performa-satellite to know what is happening?

KlugFR commented 2 months ago

I see the incoming connections on the performa server.

WebServer.log:[1721316059.21][2024-07-18 17:20:59][performa][1900][WebServer][transaction][HTTP 200 OK][/api/app/submit][{"id":"r569188","method":"POST","proto":"http","ip":"xxx.xxx.xxx.xxx","ips":["xxx.xxx.xxx.xxx","127.0.0.1"],"port":5511,"socket":"c569186","perf":{"scale":1000,"perf":{"total":1.811,"queue":0.048,"read":0.265,"process":0.337,"encode":0.584,"write":0.38},"counters":{"bytes_in":2845,"bytes_out":228,"num_requests":1}},"host":"performa.domain.tld","ua":"Performa-Satellite/1.1.4"}]

But this performa-satellite doesn't appear in the server web interface in the group it is supposed to be (nor anywhere else).

jhuckaby commented 2 months ago

Check the Performa.log for errors:

grep error /opt/performa/logs/Performa.log

Also check the "Admin" tab in the UI. See anything there?

KlugFR commented 2 months ago

Nothing in the error log (some errors about the SMTP not setup but nothing about this).

Same in the admin tab: some CPU alerts triggered on other servers (red), error sending email (yellow) and alerts cleared (white).

jhuckaby commented 2 months ago

Huh, that's really weird. I am not sure what else to try. You may have two servers with the same exact hostname? Try grepping the logs for:

grep "Double submission" /opt/performa/logs/Performa.log
KlugFR commented 2 months ago

Nope, neither "Double submission". No error on the satellite side (if the server doesn't answer or is not happy, it is shown by satellite.bin when launched from CLI - I saw such errors using bad port or protocol). No error on the server side (the server receives the data from satellite but does nothing with them):

[1721406659.144][2024-07-19 18:30:59][performa][1900][WebServer][transaction][HTTP 200 OK][/api/app/hello][{"id":"r693406","method":"POST","proto":"http","ip":"xxx.xxx.xxx.xxx","ips":["xxx.xxx.xxx.xxx","127.0.0.1"],"port":5511,"socket":"c693405","perf":{"scale":1000,"perf":{"total":1.934,"queue":0.252,"read":0.247,"process":0.131,"encode":0.647,"write":0.33},"counters":{"bytes_in":273,"bytes_out":381,"num_requests":1}},"host":"performa.domain.tld","ua":"Performa-Satellite/1.1.4"}]
[1721406659.387][2024-07-19 18:30:59][performa][1900][WebServer][transaction][HTTP 200 OK][/api/app/submit][{"id":"r693407","method":"POST","proto":"http","ip":"xxx.xxx.xxx.xxx","ips":["xxx.xxx.xxx.xxx","127.0.0.1"],"port":5511,"socket":"c693405","perf":{"scale":1000,"perf":{"total":2.421,"queue":0.104,"read":0.413,"process":0.528,"encode":0.632,"write":0.322},"counters":{"bytes_in":2824,"bytes_out":228,"num_requests":1}},"host":"performa.domain.tld","ua":"Performa-Satellite/1.1.4"}]

I added new satellite (new hosts) to the same server today, no issue with them, they're all here.

jhuckaby commented 2 months ago

I cannot explain it. That's quite a mystery. I'll leave this issue open, in case anyone else has any ideas.

KlugFR commented 2 months ago

I'm going to disable the cron, wait for 24 hours then enable it.

KlugFR commented 2 months ago

Seems it worked: I waited (till today), re-enabled the cron job and the satellite is now in the web interface of server.

sseodate commented 2 months ago

The same issue here, i have 6 device running ubuntu and attached to a server smoothly. Lately some new device, same steps, same ubuntu didnt return anything error but not show on the Server. After reading i wait for 24h and still nothing connect to the server (other still show up smoothly) .

jhuckaby commented 2 months ago

Make sure all your Performa Satellites and Performa Servers are the latest version. Recently there was a breaking change to the API due to a security issue.

KlugFR commented 2 months ago

About this, there's a problem with the readme.md on the gihub project (or an error with the link in it).

In the readme.md, the link is https://github.com/jhuckaby/performa-satellite/releases/latest/download/performa-satellite-linux-x64. This currently links to version 1.1.4.

The real latest version is: https://github.com/jhuckaby/performa-satellite/releases/download/v1.1.5/performa-satellite-linux-x64

jhuckaby commented 2 months ago

@KlugFR I think you are mistaken:

jhuckaby@joemax ~ $ curl -v "https://github.com/jhuckaby/performa-satellite/releases/latest/download/performa-satellite-linux-x64"

* Host github.com:443 was resolved.
* IPv6: (none)
* IPv4: 140.82.116.3
*   Trying 140.82.116.3:443...
* Connected to github.com (140.82.116.3) port 443

... snip ...

< HTTP/2 302 
< location: https://github.com/jhuckaby/performa-satellite/releases/download/v1.1.5/performa-satellite-linux-x64

It redirects to v1.1.5 as it should.

KlugFR commented 2 months ago

Looks you're right, I made some kind of mistake somewhere (maybe mismatched the server version against the satellite version). Sorry.

jhuckaby commented 2 months ago

No worries, all good 😊

sseodate commented 2 months ago

Which way to get the version of server and client ? isnt that in debug?

jhuckaby commented 2 months ago

For Performa Server you can do:

grep version /opt/performa/package.json

There is no easy to get the exact version for Performa Satellite (sorry -- I'll look into adding this in a future release). However, if you run it with the --debug flag and you see an auth property in the JSON, then it is Satellite v1.1.4 or higher (which is what you want).

sseodate commented 2 months ago

Server : "version": "1.1.4", Client : "version": "1.0", "date": 1722272325.871, "hostname": "xxxxxxxxxxx", "auth": "xxxxxxxxxxxxx",

So this is not the problem of version ?

jhuckaby commented 2 months ago

Yeah, looks like you have some other issue. Are you sure there are no errors in your Performa Server log?

grep -E '\[error\]' /opt/performa/logs/Performa.log

Also check the Performa Satellite error log on the newly added server:

cat /tmp/performa-satellite-error.txt

Finally, can your new server actually reach the Performa Server, and complete a HTTP request? Try this on your newly added server:

curl -v http://PERFORMA_SERVER_HOSTNAME:5511/app/api/echo

Good luck!

sseodate commented 2 months ago

This is strange : Error from Server : double submission from other 2 client but still ok. no error from this new client. New client havent got andy error file. Echo test: Connected to server and Status 200. User-Agent: curl/7.81.0

Accept: /

  • Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Content-Type: application/json < Access-Control-Allow-Origin: * < Server: Performa 1.0 < Content-Length: 49 < Date: Mon, 29 Jul 2024 17:06:44 GMT < Connection: keep-alive < Keep-Alive: timeout=5 < {"code":1,"description":"Unsupported API: echo"}
jhuckaby commented 2 months ago

@sseodate Double submission error means that you have multiple servers with identical hostnames. Make sure every server has a unique hostname -- then they will all show up properly.

sseodate commented 2 months ago

Maybe i added 2 crontab so that run 2x time. The name cant be duplicated. I ll try to get rid of this error in the next morning and come back! Thank you!

sseodate commented 2 months ago

@sseodate Double submission error means that you have multiple servers with identical hostnames. Make sure every server has a unique hostname -- then they will all show up properly.

clean up crontab and no more double submission. But the new client still cannot show on server. it take a little more time to completed the job but nothing on the server screen.

sseodate commented 2 months ago

I check USER log as below

[1722310057.633][2024-07-30 03:27:37][][2223][User][transaction][user_login][admin][{"ip":"CLIENTIP","headers":{"host":"SERVERIP","connection":"keep-alive","content-length":"81","accept":"text/plain, /; q=0.01","x-requested-with":"XMLHttpRequest","x-session-id":"2f2c64007885acedf2dd17736227eb560a1bd3417e205c2c4634f96d452b2018","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36","content-type":"application/json","origin":"http://SERVERIP:5511","referer":"http://SERVERIP:5511/","accept-encoding":"gzip, deflate","accept-language":"en-US,en;q=0.9,vi;q=0.8"}}]

So I think client send data successfully to the Server but nothing show data from the client ??? Also there are alot of ] [1722349108.903][2024-07-30 14:18:28][ss-build][2223][User][error][no_data][No data found][] [1722349169.05][2024-07-30 14:19:29][ss-build][2223][User][error][no_data][No data found][] [1722355632.14][2024-07-30 16:07:12][ss-build][2223][User][error][submit][Invalid authentication token][] [1722355751.995][2024-07-30 16:09:11][ss-build][2223][User][error][submit][Invalid authentication token][] [1722355871.582][2024-07-30 16:11:11][ss-build][2223][User][error][submit][Invalid authentication token][]

What happen here?

jhuckaby commented 2 months ago

The "No data found" errors can be ignored. They just mean that a user tried to view historical graphs in the UI and no data was found for the date/time range they selected.

The "Invalid authentication token" errors mean that your servers have clocks that are way out of sync (like, over a minute off). Please install and enable NTP on all your servers so they keep correct time.

Debian / Ubuntu

echo "Updating package index on Ubuntu..."
sudo apt-get update

echo "Installing NTP on Ubuntu..."
sudo apt-get install -y ntp

echo "Starting and enabling NTP service on Ubuntu..."
sudo systemctl start ntp
sudo systemctl enable ntp

echo "Verifying NTP status on Ubuntu..."
sudo systemctl status ntp

echo "Checking synchronization status on Ubuntu..."
ntpq -p

RedHat / Fedora / CentOS

echo "Updating package index on RedHat..."
sudo yum update -y

echo "Installing NTP on RedHat..."
sudo yum install -y ntp

echo "Starting and enabling NTP service on RedHat..."
sudo systemctl start ntpd
sudo systemctl enable ntpd

echo "Verifying NTP status on RedHat..."
sudo systemctl status ntpd

echo "Checking synchronization status on RedHat..."
ntpq -p

Also, if your servers are virtual machines or docker containers, you may also need to install and/or configure NTP on the host machine.

Good luck.

sseodate commented 2 months ago

I will try to check NTP and monitor this issue. Be right back if i have more info. Thank a lot!

sseodate commented 1 month ago

I got my client running perfect now. I have to move the client to another router then it connect and show online perfectly. So the problem is my router at home. BTW i still dont know how to debug what happen in my network that prevent connection to server eventhough that i could run test echo.