metrico / qryn

⭐️ All-in-One Polyglot Observability with OLAP Storage for Logs, Metrics, Traces & Profiles. Drop-in Grafana Cloud replacement compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog and beyond :rocket:
https://qryn.dev
GNU Affero General Public License v3.0
1.19k stars 67 forks source link

/influx/api/v2/write end point create crash on random senario #433

Closed voicenter closed 8 months ago

voicenter commented 8 months ago

when implementing proxmox metric server we get a crash of the container with the flowing log :

`

qryn@3.0.31 start node qryn.mjs 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Initializing DB... cloki 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=xxh ready 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=xxh ready 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Checking clickhouse capabilities 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=checking old samples support: samples_v2 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=checking old samples support: samples 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Server listening at http://0.0.0.0:3100/ 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Qryn API up 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Qryn API listening on http://0.0.0.0:3100/ 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-1 req={"method":"POST","url":"/influx/api/v2/write?org=proxmox&bucket=proxmox","hostname":"192.168.182.176:3100","remoteAddress":"185.138.169.134","remotePort":35134} msg=incoming request 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-1 res={"statusCode":204} responseTime=28.397363662719727 msg=request completed  :30,"time":1705665809080,"pid":23,"hostname":"1df6b0972fa7","name":"qryn","reqId":"req-2","req":{"method":"POST","url":"/api/prom/remote/write","hostname":"192.168.182.176:3100","remoteAddress":"185.138.169.161","remotePort":46104},"msg":"incoming request"} 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-3 req={"method":"POST","url":"/influx/api/v2/write?org=proxmox&bucket=proxmox","hostname":"192.168.182.176:3100","remoteAddress":"185.138.169.134","remotePort":39690} msg=incoming request 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-3 res={"statusCode":204} responseTime=4.993587493896484 msg=request completed 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-2 res={"statusCode":204} responseTime=122.09898567199707 msg=request completed 2024/01/19 02:03PM 50 pid=23 hostname=1df6b0972fa7 name=qryn err=Error: Request failed with status code 400 Response: [400] undefined Error: Request failed with status code 400 at createError (/app/node_modules/axios/lib/core/createError.js:16:15) at settle (/app/node_modules/axios/lib/core/settle.js:17:12) at IncomingMessage.handleStreamEnd (/app/node_modules/axios/lib/adapters/http.js:269:11) at IncomingMessage.emit (node:events:530:35) at endReadableNT (node:internal/streams/readable:1696:12) at process.processTicksAndRejections (node:internal/process/task_queues:82:21) msg=AXIOS ERROR  :30,"time":1705665815031,"pid":23,"hostname":"1df6b0972fa7","name":"qryn","reqId":"req-4","req":{"method":"POST","url":"/api/prom/remote/write","hostname":"192.168.182.176:3100","remoteAddress":"185.138.169.161","remotePort":46104},"msg":"incoming request"} 2024/01/19 02:03PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-4 res={"statusCode":204} responseTime=171.20169258117676 msg=request completed

`

akvlad commented 8 months ago

seems like clickhouse connection outage. Are you sure clickhouse connection is always stable? Do you use /influx/api/v2/write? Can you please send a couple of examples of /influx/api/v2/write requests you send to qryn?

voicenter commented 8 months ago

T 192.168.23.208:58944 -> 192.168.182.176:3100 [AP] #41 POST /influx/api/v2/write?org=proxmox&bucket=proxmox HTTP/1.1. TE: deflate,gzip;q=0.3. Connection: TE, close. Authorization: Token proxmox. Host: 192.168.182.176:3100. User-Agent: libwww-perl/6.68. Content-Length: 1767. . blockstat,object=nodes,host=med01dell105node01 bavail=91697192960,bfree=96867930112,blocks=100861726720,favail=6234369,ffree=6234369,files=6291456,fper=1,fused=57087,per=4,su_bavail=96867930112,su_blocks=100861726720,su_favail=6234369,su_files=6291456,used=3993796608,user_bavail=91697192960,user_blocks=95690989568,user_favail=6234369,user_files=6291456,user_fused=57087,user_used=3993796608 1705662939000000000 cpustat,object=nodes,host=med01dell105node01 avg1=0.30,avg15=0.19,avg5=0.33,cpu=0.000883566973057801,cpus=80,ctime=1705662939.36927,guest=0,guest_nice=0,idle=14505992595,iowait=8937,irq=0,nice=1704,softirq=11259,steal=0,sum=0,system=5143434,total=14513805846,used=7804314,user=2647917,wait=0 1705662939000000000 memory,object=nodes,host=med01dell105node01 arcsize=19200,memfree=267376545792,memshared=0,memtotal=270037594112,memused=2661048320,swapfree=8589930496,swaptotal=8589930496,swapused=0 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=bond0 receive=8387032517,transmit=825430526 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=enp24s0f0 receive=0,transmit=0 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=enp24s0f1 receive=0,transmit=0 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=ens4f0 receive=13729186774,transmit=844445380 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=ens4f1 receive=0,transmit=0 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=lo receive=51267141,transmit=51267141 1705662939000000000 nics,object=nodes,host=med01dell105node01,instance=vmbr0 receive=8663393371,transmit=839055810 1705662939000000000 system,object=nodes,host=med01dell105node01 uptime=1814571 1705662939000000000

lmangani commented 8 months ago

We suspect the issue originating from ClickHouse insert failures. Please try again using 3.1.2 which will provide extended logging for received errors, hopefully leading to further understanding. Thanks in advance @voicenter ✌️

maxim-voicenter commented 8 months ago

Hi guys , We pulled latest and 3.1.2 both gave us the same image , which like point to 3.1.1 but still more info appeaer :

`Node.js v20.11.0

qryn@3.1.1 start node qryn.mjs 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Initializing DB... cloki 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=xxh ready 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=xxh ready 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Checking clickhouse capabilities 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=checking old samples support: samples_v2 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=checking old samples support: samples 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Server listening at http://0.0.0.0:3100 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Qryn API up 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn msg=Qryn API listening on http://0.0.0.0:3100 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-1 req={"method":"POST","url":"/influx/api/v2/write?org=proxmox&bucket=proxmox","hostname":"192.168.182.176:3100","remoteAddress":"185.138.169.134","remotePort":56716} msg=incoming request 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-1 res={"statusCode":204} responseTime=30.69964599609375 msg=request completed 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-2 req={"method":"POST","url":"/influx/api/v2/write?org=proxmox&bucket=proxmox","hostname":"192.168.182.176:3100","remoteAddress":"185.138.169.134","remotePort":56732} msg=incoming request 2024/01/21 05:05PM 30 pid=23 hostname=1df6b0972fa7 name=qryn reqId=req-2 res={"statusCode":204} responseTime=10.011974334716797 msg=request completed 2024/01/21 05:05PM 50 pid=23 hostname=1df6b0972fa7 name=qryn err=AXIOS ERROR: Error: Request failed with status code 400 Response data: Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '.0 (UNKNOWN)","string":"pbs-library-version"}\n{"fingerprint":"8126415627791067979","timestamp_ns":"1705849515000000000","value":4,"string":"cpus"}\n{"fingerprint': (while reading the value of key value): While executing ParallelParsingBlockInputFormat: (at row 143) . (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 22.8.21.38 (official build))

Error: AXIOS ERROR: Error: Request failed with status code 400 Response data: Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '.0 (UNKNOWN)","string":"pbs-library-version"}\n{"fingerprint":"8126415627791067979","timestamp_ns":"1705849515000000000","value":4,"string":"cpus"}\n{"fingerprint': (while reading the value of key value): While executing ParallelParsingBlockInputFormat: (at row 143) . (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 22.8.21.38 (official build))

at axiosError (/app/lib/db/throttler.js:22:14)
at TimeoutThrottler.flush (/app/lib/db/throttler.js:43:26)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Timeout._onTimeout (/app/lib/db/throttler.js:124:9) msg=AXIOS ERROR

/app/lib/db/clickhouse.js:84 rejectors[msg.id](new Error('Database push error')) ^ Error: Database push error at Worker. (/app/lib/db/clickhouse.js:84:27) at Worker.emit (node:events:518:28) at MessagePort. (node:internal/worker:263:53) at [nodejs.internal.kHybridDispatch] (node:internal/event_target:826:20) at exports.emitMessage (node:internal/per_context/messageport:23:28)

`

Let us know if that give you better direction , many thanks in advace

akvlad commented 8 months ago

Hello @voicenter @maxim-voicenter
I have an upgrade and a solution suggestion for you.

The new schema support should eliminate all the crashes for you.

maxim-voicenter commented 8 months ago

Hello @akvlad Its working no crashes. I really appreciate your help .Many thanks.

lmangani commented 8 months ago

Thanks @maxim-voicenter @voicenter Closing as resolved, feel free to reopen if needed and see you soon!