yxuansu / PandaGPT

[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All
https://panda-gpt.github.io/
Apache License 2.0
768 stars 60 forks source link

Saving Error #16

Open shuxiaobo opened 1 year ago

shuxiaobo commented 1 year ago
Time to load utils op: 0.000990152359008789 seconds
  0%|          | 0/908 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_sft.py", line 110, in <module>
    main(**args)
  File "train_sft.py", line 92, in main
    agent.save_model(args['save_path'], 0)
  File "alpaca/model/agent.py", line 69, in save_model
    torch.save(checkpoint, f'{path}/pytorch_model.pt')
  File "/usr/local/python/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/usr/local/python/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
  0%|          | 0/908 [00:00<?, ?it/s]
shuxiaobo commented 1 year ago

Is there any one encounted the same error?