deepspeed zero3 also save ckpt only in rank 0?

According to /opt/conda/envs/dlrover/lib/python3.10/site-packages/dlrover/trainer/torch/flash_checkpoint/deepspeed_engine.py

def save_to_storage(self, step, state_dict, paths):
        """
        Asynchonously saves the state dict into the storage. It synchonously
        saves the state dict into the shared memory and put the path
        into a shared queue. The agent in the main process waits for the queue
        for save the state dict in the shared memory into the storage.
        Only rank 0 saves the state dict into the storage.
        Args:
            step (int): the global iteration step.
            state_dict (dict): the state dict of model and optimizer to save.
            paths (dict): the key is a category in
                ["model_states", "optim_states"] of the state dict and
                the value is the path of storage to save.
        """
        success = True
        if step > self._cached_step:
            success = self.save_to_memory(step, state_dict, paths)
        if dist.is_initialized():
            dist.barrier()
        # Only local rank 0 to notify the saving event to the agent.
        if self._local_rank == 0 and success:
            event = CheckpointEvent(type=CheckpointEventType.SAVE, step=step)
            self._event_queue.put(event)
        return success

self._local_rank == 0 means it saves ckpt only in rank0, but I use deepspeed zero3, the model state dict is divided into mutiply gpus. 请问deepspeed zero3的时候模型被shard到多卡了，保存ckpt的时候还是只异步保存rank 0那张卡的ckpt吗，感谢帮忙解答一下！

intelligent-machine-learning / dlrover

deepspeed zero3 also save ckpt only in rank 0? #1256