zjunlp / MolGen

[ICLR 2024] Domain-Agnostic Molecular Generation with Chemical Feedback
https://huggingface.co/spaces/zjunlp/MolGen
MIT License
129 stars 11 forks source link

About loading the checkpoint file for generative task #2

Closed songyinys closed 1 year ago

songyinys commented 1 year ago

Hi,

After I tried preprocessing and fine-tuning, I found that two checkpoint files were generated, one is called 'mp_rank_00_model_states.pt' and the other one is called 'zero_pp_rank_0_mp_rank_00_optim_states.pt'. It seems that both of them could be used for the following generative task as the checkpoint_path, and give me similar generative results. It would be highly appreciated if more details or explanations could be given.

Many thanks in advance!

Song

ZJU-Fangyin commented 1 year ago

Dear Song,

I'm happy to hear that you successfully ran our project. When you open the model directory, you will find a zero_to_fp32.py script. This script extracts fp32 consolidated weights from zero stage 2 and 3 DeepSpeed checkpoints. You can run this script, which will automatically generate a model file according to your zero stage during finetuning. For example,

> python zero_to_fp32.py "../model" "global_step0/0model.pkl"

Furthermore, when you use multiple GPUs to finetune the model, you will get multiple checkpoint files starting with zero_pp_rank, the number of which is the same as the number of GPUs. At this point, if you want to further use the model for inference or other tasks, you need to use zero_to_fp32.py to merge these files into a usable single file. The details involve model saving in DeepSpeed. You can refer to https://github.com/microsoft/DeepSpeed for more information.

Best, Yin Fang