Open rinx opened 4 years ago
Issue-Label Bot is automatically applying the label type/bug
to this issue, with a confidence of 0.88. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
related to #503, #556
Describe the issue:
currently, vald-agent-ngt pods have these containers:
agent-sidecar on initContainer mode may fail to complete to download backup data and it returns status code 0 (RST stream from remote host will cause this case). in this case, there may be fragments of backup data in the volume and they cause blocking of NGT startup (#503). the ideal behavior of the pods on the status like this is retrying to download backup data. however, a failing status of a container doesn't trigger pod restarts.
if there's liveness probe server in the pods, it can trigger pod restarts. however, agent-NGT has a postStop phase (it is executed after liveness probe killed) to save index. agent-sidecar has a postStop phase to upload index. so, it is required to improve internal/servers/server to handle these problems.