nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16k stars 1.41k forks source link

KV bucket broken after disk ran full #6141

Closed tehsphinx closed 3 days ago

tehsphinx commented 3 days ago

Observed behavior

What we observe when the stream / KV bucket is broken:

> nats kv put bucket key value
nats: error: nats: nats: API error: code=503 err_code=10077 description=maximum messages per subject exceeded

Same for del or purge operation (and I assume any write operation):

nats kv purge bucket key
[server] ? Purge key bucket > key? Yes
nats: error: nats: nats: API error: code=503 err_code=10077 description=maximum messages per subject exceeded

Expected behavior

Optimally a restart of the NATS server would restore the KV bucket.

Alternatively we'd need a way to restore the stream manually.

We already tried to backup -> delete -> restore the underlying KV_bucket stream, but that doesn't fix the issue.

Server and client version

Client: nats-cli (devel based off commit 4da881fdf5c87d6b064d6f8d2eb936f3802f4f73 from 8th Oct 2024) Server: nats-server: v2.6.6

Host environment

No response

Steps to reproduce

After the disk ran full (a few times) on our development environment we observed that some stream are broken. In this ticket the stream is the one powering a KV bucket. (The full disk is due to another application, not NATS JetStream).

Reproduction steps:

tehsphinx commented 3 days ago

In case we should just update nats-server to fix the issue, kindly let us know (but I'll try that next).

neilalexander commented 3 days ago

Since the 2.6.6 release is nearly three years old at this point, we would indeed normally ask if it's still reproducible on the latest version.

tehsphinx commented 3 days ago

I'll look into that and report back. Thank you!

tehsphinx commented 3 days ago

Updated to v2.10.22-alpine3.20.

Unfortunately that development environment still had a version where the JetStream data folder was not mounted. Updating the docker image wiped all data.

We'll have to get back to you on this once the disk runs full again. Until then I'll close the issue.