SCT should not generate an error event for coredumpctl metadata but rather only for the scylla decoded backtrace

yarongilor commented 1 month ago

As can be found in: https://argus.scylladb.com/test/98050732-dfe3-464c-a66a-f235bad30829/runs?additionalRuns%5B%5D=7872957f-b3ab-492a-b193-a6b3c4284aa5

an error event is reported like:

CoreDumpEvent
ERROR
no nemesis
2024-06-01 05:37:05.728
Received: 2024-06-01 05:12:43.000
one-time
Node longevity-tls-50gb-3d-master-db-node-7872957f-3 [52.211.141.233 | 10.4.8.49]
2024-06-01 05:37:05.728 <2024-06-01 05:12:43.000>: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=773c9f7a-3e27-4c42-a127-db3f79ef3107 node=Node longevity-tls-50gb-3d-master-db-node-7872957f-3 [52.211.141.233 | 10.4.8.49]
corefile_url=https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.d0159003556044638dcc3b4d53571938.7473.1717218763000000/core.scylla.112.d0159003556044638dcc3b4d53571938.7473.1717218763000000.gz
backtrace=           PID: 7473 (scylla)
UID: 112 (scylla)
GID: 119 (scylla)
Signal: 6 (ABRT)
Timestamp: Sat 2024-06-01 05:12:43 UTC (2min 14s ago)
Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 25 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --abort-on-internal-error 1 --abort-on-ebadf 1 --enable-sstable-key-validation 1 --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 1-7,9-15 --lock-memory=1
Executable: /opt/scylladb/libexec/scylla
Control Group: /scylla.slice/scylla-server.slice/scylla-server.service
Unit: scylla-server.service
Slice: scylla-server.slice
Boot ID: d0159003556044638dcc3b4d53571938
Machine ID: 4d6728f7a7bf40e4a36b386a211c4f98
Hostname: longevity-tls-50gb-3d-master-db-node-7872957f-3
Storage: /var/lib/systemd/coredump/core.scylla.112.d0159003556044638dcc3b4d53571938.7473.1717218763000000 (present)
Disk Size: 114.5G
Message: Process 7473 (scylla) of user 112 dumped core.
Stack trace of thread 7481:
#0  0x00007f054c45a884 __pthread_kill_implementation (libc.so.6 + 0x8e884)
#1  0x00007f054c409afe raise (libc.so.6 + 0x3dafe)
#2  0x00007f054c3f287f abort (libc.so.6 + 0x2687f)

soyacz commented 1 month ago

In my opinion and recent feedback from developers I think we should just put this details to text file and send to s3 as easy to download link. SCT event should contain it so we can copy-paste event to issues. Still people write these backtraces to issues and make them harder to scroll. The same applies to Argus.

roydahan commented 3 days ago

I don't understand why you think that the coredump information shouldn't be an error event. This is important information and should be always included in reported issue.

Regarding the RFE from @soyacz to make it easier for copy-paste, it can be additional to the existing inforamtion. If requested, it should be opened in new issue.

yarongilor commented 3 days ago

I don't understand why you think that the coredump information shouldn't be an error event. This is important information and should be always included in reported issue.

Regarding the RFE from @soyacz to make it easier for copy-paste, it can be additional to the existing inforamtion. If requested, it should be opened in new issue.

@roydahan , @soyacz , were you ever asked for coredump metadata by a core developer? i was never asked for it. i was only asked for the decoded backtrace. when i added this coredump metadata to an issue, they usually complain it's useless.

roydahan commented 3 days ago

It's not useless, but of course it's not enough.

scylladb / scylla-cluster-tests

SCT should not generate an error event for coredumpctl metadata but rather only for the scylla decoded backtrace #7590