A bug about saving model

WindMarx commented 12 months ago

def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output_dir: str): """Collects the state dict and dump to disk.""" state_dict = trainer.model.state_dict() if trainer.args.should_save: cpu_state_dict = { key: value.cpu() for key, value in state_dict.items() } del state_dict trainer._save(output_dir, state_dict=cpu_state_dict) # noqa

This code has a serious bug that causes cuda to fill up when the model is saved and then the save fails Screenshot from 2023-12-03 22-30-04 My experimental environment is A6000 with 49140M memory

M3Dade commented 12 months ago

I also found the problem My experimental environment is A6000 4 with 49140M 4 memory

zsxm1998 commented 11 months ago

I also got the same problem

microsoft / LLaVA-Med

A bug about saving model #32