[chatllama]Do I need to split the llama model manully?

balcklive commented 1 year ago

I downloaded a llama 7B model. It only get one model file which ends with .pth. But as the model loading code in llama_model .py showed as below says. If I want train the model with multi gpus, I need to divide the model into the same number as the graphics card. May I ask how should do that? or is there anything I did not understand?

def load_checkpoints( ckpt_dir: str, local_rank: int, world_size: int ) -> Tuple[dict, dict]: checkpoints = sorted(Path(ckpt_dir).glob("*.pth")) assert world_size == len(checkpoints), ( f"Loading a checkpoint for MP={len(checkpoints)} but world " # world size means numbers of gpus used right? f"size is {world_size}" ) ckpt_path = checkpoints[local_rank] print("Loading") checkpoint = torch.load(ckpt_path, map_location="cpu") with open(Path(ckpt_dir) / "params.json", "r") as f: params = json.loads(f.read()) return checkpoint, params

sharlec commented 1 year ago

I wonder about this question as well. I want to serve the 7B model on two servers, I am not sure what process should be done on model architecture

PierpaoloSorbellini commented 1 year ago

Hi @balcklive, you may have to enable fair scale and set the MP as stated in the llama documentations.

nebuly-ai / optimate

[chatllama]Do I need to split the llama model manully? #322