Closed juliayakovlev closed 1 year ago
@juliayakovlev sounds like a bug for manager, why would it use tmpfs ? i.e memory and not disk ? (I.e. Scylla snapshot are on disk not tmp)
also any leftovers should be handled by manager itself.
from SCT POV, seems we have keyspaces that should have been cleared (all the during repair keyspaces)
it was in the context of https://argus.scylladb.com/test/8831dfed-1945-4e7e-a0c6-6d3f848868b4/runs?additionalRuns%5B%5D=4a98679f-02ad-4c38-a717-833dd12453de
and IIUC, 2 backup nemeses failed, and in the end, we ran into ENOSPC... my suggestion was, in these cases, to remove the snapshots from the system to avoid filling the disk up...
probably the successful backups have the snapshots deleted by default, but the failed ones, don't.. @ShlomiBalalis , can you please describe here the behavior (for the snapshots) in both cases of success and failure?
it was in the context of https://argus.scylladb.com/test/8831dfed-1945-4e7e-a0c6-6d3f848868b4/runs?additionalRuns%5B%5D=4a98679f-02ad-4c38-a717-833dd12453de
and IIUC, 2 backup nemeses failed, and in the end, we ran into ENOSPC... my suggestion was, in these cases, to remove the snapshots from the system to avoid filling the disk up...
probably the successful backups have the snapshots deleted by default, but the failed ones, don't.. @ShlomiBalalis , can you please describe here the behavior (for the snapshots) in both cases of success and failure?
If we are doing snapshots into /tmp, it will always be a problem, since we have less memory then diskspace
https://argus.scylladb.com/test/8831dfed-1945-4e7e-a0c6-6d3f848868b4/runs?additionalRuns%5B%5D=4a98679f-02ad-4c38-a717-833dd12453de
Installation details
Kernel Version: 5.15.0-1019-aws Scylla version (or git commit hash):
2022.2.0~rc2-20220919.75d087a2b75a
with build-id463f1a57b82041a6c6b6441f0cbc26c8ad93091e
Cluster size: 4 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0b6ff8cdcbe0cb88a
(aws: us-east-1)Test:
longevity-lwt-500G-3d-test
Test id:4a98679f-02ad-4c38-a717-833dd12453de
Test name:enterprise-2022.2/longevity/longevity-lwt-500G-3d-test
Test config file(s):Issue description
Managment backup nemeses failed due to "no space left on device" on node1 (10.12.1.240 )
tmp
fs is used for 100%$ hydra investigate show-monitor 4a98679f-02ad-4c38-a717-833dd12453de
$ hydra investigate show-logs 4a98679f-02ad-4c38-a717-833dd12453de
Logs:
No logs captured during this run.
Jenkins job URL