currently, the step metrics map gets huge because we add one step per retry of each container. Ones this big, it is not of much use. So just have one healthcheck step per container.
Also, if the container never came up healthy, then update the status of that container's healthcheck as error