Closed zaibon closed 5 years ago
Updates on the issue:
In scenario one, we caused a failure in minio by remounting the meta director filesystem in ro mode. Minio acted as expected, by logging the expected log message. The message was visible in the stream.
In scenario two, we caused a failure in minio by breaking the zdb instance that service the disk. The message was NOT detected in the stream, instead we got the reload config message instead. Which means the robot is trying to reconfigure minio. After few minutes the machine was self healed!
We will have to try difference (minimalist) setups to figure out exactly what is going on. I am working on that as we speak.
We had a minio running for a while, then during a weekend, we couldn't write file to the minio anymore. Error was about filesystem being in read-only mode. So I guess minio was not able to write the metadata of the file anymore so any upload fails.
Question is why doesn't the self-healing was able to fix it. From what I've seen, minio never send any error regarding this read-only filesystem. So higher level of monitoring was not aware of the issue.
See https://gist.github.com/zaibon/d7a3ab87f185cee47bebc786dee5db51 for the container logs