noobaa / noobaa-core

High-performance S3 application gateway to any backend - file / s3-compatible / multi-clouds / caching / replication ...
https://www.noobaa.io
Apache License 2.0
270 stars 78 forks source link

Noobaa_nsfs service restarts due to panic on 2 of the protocol nodes #7735

Closed rkomandu closed 8 months ago

rkomandu commented 9 months ago

Environment info

Actual behavior

1.

you can observe that the nodes has the noobaa_nsfs restarts

[root@scale-13 log]# systemctl status noobaa_nsfs ● noobaa_nsfs.service - The NooBaa nsfs service. Loaded: loaded (/etc/systemd/system/noobaa_nsfs.service; enabled; preset: disabled) Active: active (running) since Thu 2024-01-18 11:11:30 CET; 56min ago Main PID: 3379876 (node) Tasks: 11 (limit: 48877) Memory: 134.2M CPU: 12.131s CGroup: /system.slice/noobaa_nsfs.service └─3379876 /usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js

/var/log/noobaa.log

2024-01-18T11:11:15.587670+01:00 scale-13 node[3379340]: [/3379340] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: closing fd ...
2024-01-18T11:11:15.588164+01:00 scale-13 node[3379340]: [/3379340] ESC[36m  [LOG]ESC[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-01-18T11:11:16.416792+01:00 scale-13 node[3379393]: [/3379393] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: reading 32 bytes from /dev/ran
dom ...
2024-01-18T11:11:16.417865+01:00 scale-13 node[3379393]: [/3379393] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random,
 total 32 ...
2024-01-18T11:11:16.418135+01:00 scale-13 node[3379393]: [/3379393] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: closing fd ...
2024-01-18T11:11:16.418616+01:00 scale-13 node[3379393]: [/3379393] ESC[36m  [LOG]ESC[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-01-18T11:11:28.320052+01:00 scale-13 node[649898]: [nsfs/649898] ESC[31m[ERROR]ESC[39m core.util.postgres_client:: postgres_client: T00000000
|Q00000001: failed with error: TypeError: Cannot read properties of undefined (reading 'query')    at _do_query ESC[90m(/usr/local/noobaa-core/ESC
[39msrc/util/postgres_client.js:250:37ESC[90m)ESC[39m    at PostgresTable.single_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_cli
ent.js:679:16ESC[90m)ESC[39m    at async Timeout.vacuumAndAnalyze [as _onTimeout] ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.j
s:700:13ESC[90m)ESC[39m
2024-01-18T11:11:28.321065+01:00 scale-13 node[649898]: [nsfs/649898] ESC[31m[ERROR]ESC[39m core.util.postgres_client:: vacuumAndAnalyze failed Ty
peError: Cannot read properties of undefined (reading 'query')    at _do_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:2
50:37ESC[90m)ESC[39m    at PostgresTable.single_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:679:16ESC[90m)ESC[39m    a
t async Timeout.vacuumAndAnalyze [as _onTimeout] ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:700:13ESC[90m)ESC[39m
2024-01-18T11:11:28.321881+01:00 scale-13 node[649898]: [nsfs/649898] ESC[31m[ERROR]ESC[39m CONSOLE:: PANIC: process uncaughtException TypeError:
Cannot read properties of undefined (reading 'query')    at _do_query (/usr/local/noobaa-core/src/util/postgres_client.js:250:37)    at PostgresTa
ble.single_query (/usr/local/noobaa-core/src/util/postgres_client.js:679:16)    at async Timeout.vacuumAndAnalyze [as _onTimeout] (/usr/local/noob
aa-core/src/util/postgres_client.js:700:13)
2024-01-18T11:11:30.718889+01:00 scale-13 node[3379838]: [/3379838] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: reading 32 bytes from /dev/random ...
2024-01-18T11:11:30.719580+01:00 scale-13 node[3379838]: [/3379838] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...

other protocol node [root@scale-11 log]# systemctl status noobaa_nsfs ● noobaa_nsfs.service - The NooBaa nsfs service. Loaded: loaded (/etc/systemd/system/noobaa_nsfs.service; enabled; preset: disabled) Active: active (running) since Thu 2024-01-18 11:11:30 CET; 1h 11min ago Main PID: 494409 (node) Tasks: 11 (limit: 48877) Memory: 196.5M CPU: 24.462s CGroup: /system.slice/noobaa_nsfs.service └─494409 /usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js

2024-01-18T11:11:20.808074+01:00 scale-11 node[494208]: [/494208] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: closing fd ...
2024-01-18T11:11:20.808478+01:00 scale-11 node[494208]: [/494208] ESC[36m  [LOG]ESC[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-01-18T11:11:21.622053+01:00 scale-11 node[494251]: [/494251] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: reading 32 bytes from /dev/random ...
2024-01-18T11:11:21.627069+01:00 scale-11 node[494251]: [/494251] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-01-18T11:11:21.627234+01:00 scale-11 node[494251]: [/494251] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: closing fd ...
2024-01-18T11:11:21.627612+01:00 scale-11 node[494251]: [/494251] ESC[36m  [LOG]ESC[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-01-18T11:11:28.083436+01:00 scale-11 node[892579]: [nsfs/892579] ESC[31m[ERROR]ESC[39m core.util.postgres_client:: postgres_client: T00000000|Q00000001: failed with error: TypeError: Cannot read properties of undefined (reading 'query')    at _do_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:250:37ESC[90m)ESC[39m    at PostgresTable.single_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:679:16ESC[90m)ESC[39m    at async Timeout.vacuumAndAnalyze [as _onTimeout] ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:700:13ESC[90m)ESC[39m
2024-01-18T11:11:28.084854+01:00 scale-11 node[892579]: [nsfs/892579] ESC[31m[ERROR]ESC[39m core.util.postgres_client:: vacuumAndAnalyze failed TypeError: Cannot read properties of undefined (reading 'query')    at _do_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:250:37ESC[90m)ESC[39m    at PostgresTable.single_query ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:679:16ESC[90m)ESC[39m    at async Timeout.vacuumAndAnalyze [as _onTimeout] ESC[90m(/usr/local/noobaa-core/ESC[39msrc/util/postgres_client.js:700:13ESC[90m)ESC[39m
2024-01-18T11:11:28.085748+01:00 scale-11 node[892579]: [nsfs/892579] ESC[31m[ERROR]ESC[39m CONSOLE:: PANIC: process uncaughtException TypeError: Cannot read properties of undefined (reading 'query')    at _do_query (/usr/local/noobaa-core/src/util/postgres_client.js:250:37)    at PostgresTable.single_query (/usr/local/noobaa-core/src/util/postgres_client.js:679:16)    at async Timeout.vacuumAndAnalyze [as _onTimeout] (/usr/local/noobaa-core/src/util/postgres_client.js:700:13)
2024-01-18T11:11:36.668085+01:00 scale-11 node[494555]: [/494555] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: reading 32 bytes from /dev/random ...
2024-01-18T11:11:36.677653+01:00 scale-11 node[494555]: [/494555] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-01-18T11:11:36.677982+01:00 scale-11 node[494555]: [/494555] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: closing fd ...
2024-01-18T11:11:36.678439+01:00 scale-11 node[494555]: [/494555] ESC[36m  [LOG]ESC[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-01-18T11:11:38.083833+01:00 scale-11 node[494932]: [/494932] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: reading 32 bytes from /dev/random ...
2024-01-18T11:11:38.086971+01:00 scale-11 node[494932]: [/494932] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-01-18T11:11:38.093470+01:00 scale-11 node[494932]: [/494932] ESC[36m  [LOG]ESC[39m CONSOLE:: read_rand_seed: closing fd ...

where as below one didn't run into this postgres (panic message)

[root@scale-12 ~]# systemctl status noobaa_nsfs ● noobaa_nsfs.service - The NooBaa nsfs service. Loaded: loaded (/etc/systemd/system/noobaa_nsfs.service; enabled; preset: disabled) Active: active (running) since Thu 2024-01-18 08:48:16 CET; 3h 34min ago Main PID: 3996087 (node) Tasks: 11 (limit: 48877) Memory: 98.0M CPU: 53.449s CGroup: /system.slice/noobaa_nsfs.service └─3996087 /usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js

I didn't do any operations or max it could be due to not setting the region it failed ?

Expected behavior

  1. noobaa_nsfs panic is not a good sign.

Steps to reproduce

  1. Didn't do any specific command on the system , except try to create a bucket with s3 user account

More information - Screenshots / Logs / Other output

rkomandu commented 9 months ago

noobaa event log on the scale-11 is as follows " Active: active (running) since Thu 2024-01-18 11:11:30 CET; 1h 38min ago"

refer the 11:11:30, nothing special in the below noobaa_events.log

[root@scale-11 log]# less noobaa_events.log
{"timestamp":"2024-01-17T11:11:29.006666+01:00","message":"{\"code\":\"noobaa_started\",\"entity_type\":\"NODE\",\"event_type\":\"STATE_CHANGE\",\"message\":\"Noobaa started\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"pid\":892579}","host":"scale-11","severity":"notice","facility":"local2","syslog-tag":"node[892579]:","source":"node"}
{"timestamp":"2024-01-18T11:12:00.118421+01:00","message":"{\"code\":\"noobaa_started\",\"entity_type\":\"NODE\",\"event_type\":\"STATE_CHANGE\",\"message\":\"Noobaa started\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"pid\":494409}","host":"scale-11","severity":"notice","facility":"local2","syslog-tag":"node[494409]:","source":"node"}
{"timestamp":"2024-01-18T11:27:53.765680+01:00","message":"{\"code\":\"noobaa_account_created\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"message\":\"New noobaa account created.\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account_name\":\"s3user5003@fvt.com\"},\"pid\":494409}","host":"scale-11","severity":"notice","facility":"local2","syslog-tag":"node[494409]:","source":"node"}
{"timestamp":"2024-01-18T11:40:54.840016+01:00","message":"{\"code\":\"noobaa_account_created\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"message\":\"New noobaa account created.\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account_name\":\"s3user5003@fvt.com\"},\"pid\":494409}","host":"scale-11","severity":"notice","facility":"local2","syslog-tag":"node[494409]:","source":"node"}

[root@scale-13 log]# systemctl status noobaa_nsfs
● noobaa_nsfs.service - The NooBaa nsfs service.
     Loaded: loaded (/etc/systemd/system/noobaa_nsfs.service; enabled; preset: disabled)
     Active: active (running) since Thu 2024-01-18 11:11:30 CET; 1h 39min ago

noobaa_events.log
[root@scale-13 log]# less noobaa_events.log
{"timestamp":"2024-01-17T11:11:29.411330+01:00","message":"{\"code\":\"noobaa_started\",\"entity_type\":\"NODE\",\"event_type\":\"STATE_CHANGE\",\"message\":\"Noobaa started\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"pid\":649898}","host":"scale-13","severity":"notice","facility":"local2","syslog-tag":"node[649898]:","source":"node"}
{"timestamp":"2024-01-18T11:11:56.975086+01:00","message":"{\"code\":\"noobaa_started\",\"entity_type\":\"NODE\",\"event_type\":\"STATE_CHANGE\",\"message\":\"Noobaa started\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"pid\":3379876}","host":"scale-13","severity":"notice","facility":"local2","syslog-tag":"node[3379876]:","source":"node"}
romayalon commented 9 months ago

reopening and adding request validation @rkomandu please closed when verified

nimrod-becker commented 8 months ago

Do we have any news on this?

rkomandu commented 8 months ago

@nimrod-becker , this has to be performed on a different env, not on the regular one. I will need to check on that and come back

rkomandu commented 8 months ago

Got the upstream noobaa 0129 build and deployed it on the Scale cluster. Will watch it for 3 to 4 hours and then close it, if no postgres client issues are noticed

romayalon commented 8 months ago

@rkomandu you need to keep the machine for at least 1 day, this vaccumAnalyzer runs once a day

rkomandu commented 8 months ago

I and @romayalon discussed over a call and showed her my other env, where had noobaa- 0129 d/s build. The noobaa_nsfs was up for more than 2 days (in Fyre cluster) and hence decided that we can close this out

[root@rkomandu-ip-cls-x-worker1 log]# systemctl status noobaa_nsfs
● noobaa_nsfs.service - The NooBaa nsfs service.
     Loaded: loaded (/etc/systemd/system/noobaa_nsfs.service; enabled; preset: disabled)
     Active: active (running) since Fri 2024-02-09 01:16:38 PST; 3 days ago
   Main PID: 2930287 (node)
      Tasks: 33 (limit: 48870)
     Memory: 342.1M
        CPU: 1h 5min 16.838s
     CGroup: /system.slice/noobaa_nsfs.service
             ├─2930287 /usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js
             ├─2930523 /usr/local/noobaa-core/node/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js
             └─2930524 /usr/local/noobaa-core/node/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js

Feb 12 01:06:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930523]: [nsfs/2930523]   [LOG] CONSOLE:: { timestamp: 1707728807888, group_name: 'default-endpo>
Feb 12 01:06:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930523]: Feb-12 1:06:47.888 [nsfs/2930523]   [LOG] CONSOLE:: { timestamp: 1707728807888, group_n>
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930524]: [nsfs/2930524]    [L0] core.endpoint.endpoint:: Sending endpoint report: { group_name: >
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930524]: Feb-12 1:16:47.371 [nsfs/2930524]    [L0] core.endpoint.endpoint:: Sending endpoint rep>
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930524]: [nsfs/2930524]   [LOG] CONSOLE:: { timestamp: 1707729407372, group_name: 'default-endpo>
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930524]: Feb-12 1:16:47.373 [nsfs/2930524]   [LOG] CONSOLE:: { timestamp: 1707729407372, group_n>
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930523]: [nsfs/2930523]    [L0] core.endpoint.endpoint:: Sending endpoint report: { group_name: >
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930523]: Feb-12 1:16:47.890 [nsfs/2930523]    [L0] core.endpoint.endpoint:: Sending endpoint rep>
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930523]: [nsfs/2930523]   [LOG] CONSOLE:: { timestamp: 1707729407890, group_name: 'default-endpo>
Feb 12 01:16:47 rkomandu-ip-cls-x-worker1.fyre.ibm.com node[2930523]: Feb-12 1:16:47.891 [nsfs/2930523]   [LOG] CONSOLE:: { timestamp: 1707729407890, group_n>
[root@rkomandu-ip-cls-x-worker1 log]# rpm -qa |grep noobaa
noobaa-core-5.15.0-20240125.el9.x86_64

[root@rkomandu-ip-cls-x-worker1 log]# rpm -qi noobaa-core-5.15.0-20240125.el9.x86_64
Name        : noobaa-core
Version     : 5.15.0
Release     : 20240125.el9
Architecture: x86_64
Install Date: Sun 28 Jan 2024 11:19:45 PM PST
Group       : Unspecified
Size        : 409313992
License     : Apache-2.0
Signature   : (none)
Source RPM  : noobaa-core-5.15.0-20240125.el9.src.rpm
Build Date  : Thu 25 Jan 2024 09:15:13 AM PST
Build Host  : x86-64-07.build.eng.rdu2.redhat.com
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://www.noobaa.io/
Summary     : NooBaa RPM
Description :
NooBaa is a data service for cloud environments, providing S3 object-store interface with flexible tiering, mirroring, and spread placement policies, over any storage resource that allows GET/PUT including S3, GCS, Azure Blob, Filesystems, etc.

closing this out