what is the version of python

learn01one commented 9 months ago

hello,can i ask the version of the python,i met the problem like ValueError: mutable default <class 'trainer.accelerators.base_accelerator.DebugConfig'> for field debug is not allowed: use default_factory

yuvalkirstain commented 9 months ago

Seems like the issue is hydra version. I'm not which version it was, but try to use an older one.

learn01one commented 9 months ago

hello,The code running environment is torch=2.2.0，python=3.11,Others are installed provided versions，Many similar questions arise，like ,ValueError: mutable default <class 'trainer.configs.configs.DebugConfig'> for field debug is not allowed: use default_factory,and ValueError: mutable default <class 'trainer.accelerators.deepspeed_accelerator.DeepSpeedConfig'> for field deepspeed is not allowed: use default_factory

It should be an environmental problem. Can you provide a more detailed operating environment? Thanks.

yuvalkirstain commented 9 months ago

I was running with python 3.8

yuvalkirstain commented 9 months ago

I don't have access to the env I ran with at the moment. If you continue to have this issue I'll help you debug it.

learn01one commented 9 months ago

hello，All the problems before using python=3.8 have been solved. The training seems to be almost successful， just encountered a small problem.，like

fp16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=True)
│ 23 │ bf16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False)
│ 24 │ optimizer: dict = field(default_factory=lambda: {
│ 25 │ │ "type": "AdamW",
TypeError: MixedPrecisionConfig() takes no arguments

Are there any additional parameters required during operation?thanks

yuvalkirstain commented 9 months ago

Can you please show here the entire error trace? Also, what cmd you are running.

learn01one commented 9 months ago

ok，running it by: accelerate launch --dynamo_backend no --gpu_ids all --num_processes 8 --num_machines 1 --use_deepspeed trainer/scripts/train.py +experiment=clip_h output_dir=output

the entire error trace like: ────────────────────────────────╮ │ /PickScore/trainer/scripts/train.py:13 in │ │ │ │ 10 from torch import nn │ │ 11 import sys │ │ 12 │ │ ❱ 13 from trainer.accelerators.base_accelerator import BaseAccelerator │ │ 14 from trainer.configs.configs import TrainerConfig, instantiate_with_cfg │ │ 15 │ │ 16 import pdb │ │ │ │ /PickScore/trainer/accelerators/init.py:4 in │ │ │ │ 1 from hydra.core.config_store import ConfigStore │ │ 2 │ │ 3 from trainer.accelerators.debug_accelerator import DebugAcceleratorConfig │ │ ❱ 4 from trainer.accelerators.deepspeed_accelerator import DeepSpeedAcceleratorConfig │ │ 5 │ │ 6 ACCELERATOR_GROUP_NAME = "accelerator" │ │ 7 │ │ │ │ /PickScore/trainer/accelerators/deepspeed_accelerator.py:20 in │ │ │ │ │ │ 17 │ │ 18 │ │ 19 @dataclass │ │ ❱ 20 class DeepSpeedConfig: │ │ 21 │ #pdb.set_trace() │ │ 22 │ fp16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=True) │ │ 23 │ bf16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False) │ │ │ │ /PickScore/trainer/accelerators/deepspeed_accelerator.py:22 in │ │ DeepSpeedConfig │ │ │ │ 19 @dataclass │ │ 20 class DeepSpeedConfig: │ │ 21 │ #pdb.set_trace() │ │ ❱ 22 │ fp16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=True) │ │ 23 │ bf16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False) │ │ 24 │ optimizer: dict = field(default_factory=lambda: { │ │ 25 │ │ "type": "AdamW", │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: MixedPrecisionConfig() takes no arguments

yuvalkirstain commented 9 months ago

Really strange.. I am able to run this code with no problem:

>>> from omegaconf import OmegaConf, MISSING, II
>>>
>>> from dataclasses import dataclass, field
>>>
>>> @dataclass
... class MixedPrecisionConfig:
...     enabled: bool = MISSING
...
>>> @dataclass
... class DeepSpeedConfig:
...     fp16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False)
...     bf16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False)
...
>>>

With Python 3.10.13. Perhaps try to use 3.10?

learn01one commented 9 months ago

hell0,Using python=3.10, the same problem still occurs. Can you show your deepspeed default_config.yaml?thanks

yuvalkirstain commented 9 months ago

what does the deepspeed config has to do with it? Can you locally run the code that fails?

from omegaconf import OmegaConf, MISSING, II
from dataclasses import dataclass, field

@dataclass
class MixedPrecisionConfig:
    enabled: bool = MISSING

@dataclass
class DeepSpeedConfig:
    fp16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False)
    bf16: MixedPrecisionConfig = MixedPrecisionConfig(enabled=False)

yuvalkirstain / PickScore

what is the version of python #17