Open kranskydog opened 1 year ago
Okay, so, here is the thing. The get_job_status
API is actually working as designed. This API only works on the master node. If you hit a backup node, it returns a HTTP 302 redirect over to the master. This is explained in the docs here:
https://github.com/jhuckaby/Cronicle/blob/master/docs/APIReference.md#redirects
I cannot explain why you are seeing that weird "protocol violation" error, or where that is even coming from. Some kind of proxy server you have in the middle, which isn't expecting a HTTP 302? Dunno.
Anyway, here is the thing. The get_history
API, which you cite as an example of something working correctly, is actually not 😝 . That API is failing to check if the current server is master before running, which is a bug.
I will fix that.
Hmmmm [apache@apchop01 ~]$ wget "http://orcl02.example.com:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02"--2023-11-02 10:49:07-- http://orcl02.example.com:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02 Resolving orcl02.example.com (orcl02.example.com)... 192.168.56.55 Connecting to orcl02.example.com (orcl02.example.com)|192.168.56.55|:3012... connected. HTTP request sent, awaiting response... 302 Found Location: http://::ffff:192.168.56.50:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02 [following] http://::ffff:192.168.56.50:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02: Invalid host name.
IPV6?
[apache@apchop01 ~]$ curl -v -L "http://orcl02.example.com:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02"
GET /api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02 HTTP/1.1 Host: orcl02.example.com:3012 User-Agent: curl/7.76.1 Accept: /
Okay, that is really bizarre. Your backup server thinks that the master server's IP address is ::ffff:192.168.56.50
. I've never seen that before.
What does your server data look like? Try:
/opt/cronicle/bin/storage-cli.js list_get global/servers
Are the IPs munged in there as well? I'm still trying to fathom how this could possibly have happened.
[root@orcl02 cronicle]# /opt/cronicle/bin/storage-cli.js list_get global/servers Got 4 items. Items from list: global/servers: [ { "hostname": "orcl02.example.com", "ip": "192.168.56.55" }, { "hostname": "orcl01.example.com", "ip": "192.168.56.50" }, { "hostname": "orclxe.example.com", "ip": "192.168.56.25" }, { "hostname": "apchop01.example.com", "ip": "192.168.56.30" } ]
Okay thanks, all normal there. I'll have to dig into this when I have some time. That is really a weird bug.
OTOH
[root@orcl02 cronicle]# netstat -anp | grep Cronicle tcp6 0 0 :::3012 ::: LISTEN 772/Cronicle Server tcp6 0 0 192.168.56.55:3012 192.168.56.50:27976 ESTABLISHED 772/Cronicle Server udp 0 0 0.0.0.0:3014 0.0.0.0: 772/Cronicle Server
So, it seems because Cronicle is bound to an IPV6 address, anything it gets is going to come from an IPV6 address, so It thinks everything needs to be an IPV6 address https://nodejs.org/dist/latest-v4.x/docs/api/http.html#http_server_listen_port_hostname_backlog_callback
setting "server_comm_use_hostnames": true, "web_socket_use_hostnames": true, helps
Okay, thank you for all this info. I'll dig in as soon as I have time.
Summary
the get_job_status API fails if it happens to hit the backup master in multi-server config . This does not happen with other API calls
Steps to reproduce the problem
1) create multi server setup with primary and backup masters 2) call get_job_status API to primary master - works 3) call get_job_status API to backup master - fails with
Your Setup
Virtualbox Cronicle 0.9.38 2 master servers (primary, backup)- access via round-robin DNS (Oracle SCAN IPs) to virtual hostname conf/config.json has "web_direct_connect": true, 2 other worker servers can connect to web console via virtual hostname and everything works as expected. Can use other APIs against both master nodes and they work correctly ie
Operating system and version?
[root@orcl01 ~]# cat /etc/oracle-release Oracle Linux Server release 7.9 [root@orcl01 ~]# uname -a Linux orcl01.example.com 5.4.17-2136.324.5.3.el7uek.x86_64 #2 SMP Tue Oct 10 12:44:19 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
Node.js version?
v16.20.2
Cronicle software version?
0.9.38
Are you using a multi-server setup, or just a single server?
Multi
Are you using the filesystem as back-end storage, or S3/Couchbase?
filesystem (cluster)
Can you reproduce the crash consistently?
yes
Log Excerpts
Can't see anything specific