kohya-ss / sd-scripts

Apache License 2.0
5.32k stars 880 forks source link

loss and lr not record on wandb #494

Open kf1111 opened 1 year ago

kf1111 commented 1 year ago

image

I attempted to record the loss and learning rate of my lora learning, but only GPU information was recorded. My config.toml file contains the following settings:

log_with = "wandb" log_tracker_name = "lora_0511" wandb_api_key = "apikey"

pretrained_model_name_or_path = "....ckpt" train_data_dir = "..."

shuffle_caption = true caption_extension = ".txt" keep_tokens = 20 resolution = "768" vae_batch_size = 4 enable_bucket = true output_dir = "..." output_name = "..." save_precision = "fp16" save_every_n_epochs = 10

train_batch_size = 2 gradient_checkpointing = true gradient_accumulation_steps = 64

max_token_length = 150 xformers = true max_train_epochs = 50 persistent_data_loader_workers = true seed = 42 mixed_precision = "bf16" clip_skip = 2

multires_noise_iterations = 6 multires_noise_discount = 0.1

flip_aug = true use_8bit_adam = true lr_scheduler = "cosine_with_restarts" lr_warmup_steps = 12 lr_scheduler_num_cycles = 10 unet_lr = 0.0004 text_encoder_lr = 0.0002 network_module = "networks.lora" network_dim = 64 network_alpha = 32.0

https://github.com/kohya-ss/sd-scripts/pull/428 I read this page, and know it's ok to ignore "logging_dir"

rockerBOO commented 1 year ago

Just tried it and its recording for me Screenshot 2023-05-11 at 23-02-34 Weights   Biases

kf1111 commented 1 year ago

Could there be a problem with my Python environment?

Package Version


absl-py 1.4.0 accelerate 0.15.0 aiohttp 3.8.4 aiosignal 1.3.1 albumentations 1.3.0 altair 4.2.2 appdirs 1.4.4 astunparse 1.6.3 async-timeout 4.0.2 attrs 23.1.0 bitsandbytes 0.38.1 cachetools 5.3.0 certifi 2022.12.7 charset-normalizer 2.1.1 click 8.1.3 colorama 0.4.6 diffusers 0.10.2 docker-pycreds 0.4.0 easygui 0.98.3 einops 0.6.0 entrypoints 0.4 fairscale 0.4.13 filelock 3.9.0 flatbuffers 23.5.8 frozenlist 1.3.3 fsspec 2023.5.0 ftfy 6.1.1 gast 0.4.0 gitdb 4.0.10 GitPython 3.1.31 google-auth 2.18.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.54.0 h5py 3.8.0 huggingface-hub 0.13.3 idna 3.4 imageio 2.28.1 importlib-metadata 6.6.0 Jinja2 3.1.2 joblib 1.2.0 jsonschema 4.17.3 keras 2.10.0 Keras-Preprocessing 1.1.2 lazy_loader 0.2 libclang 16.0.0 library 0.0.0 lightning-utilities 0.8.0 Markdown 3.4.3 MarkupSafe 2.1.2 mpmath 1.2.1 multidict 6.0.4 mypy-extensions 1.0.0 networkx 3.0 numpy 1.24.1 oauthlib 3.2.2 opencv-python 4.7.0.68 opencv-python-headless 4.7.0.72 opt-einsum 3.3.0 packaging 23.1 pandas 2.0.1 pathtools 0.1.2 Pillow 9.3.0 pip 23.0.1 protobuf 3.19.6 psutil 5.9.5 pyasn1 0.5.0 pyasn1-modules 0.3.0 pyre-extensions 0.0.29 pyrsistent 0.19.3 python-dateutil 2.8.2 pytorch-lightning 1.9.0 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 qudida 0.0.4 regex 2023.5.5 requests 2.28.1 requests-oauthlib 1.3.1 rsa 4.9 safetensors 0.2.6 scikit-image 0.20.0 scikit-learn 1.2.2 scipy 1.10.1 sentry-sdk 1.22.2 setproctitle 1.3.2 setuptools 65.5.0 six 1.16.0 smmap 5.0.0 sympy 1.11.1 tensorboard 2.10.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.10.1 tensorflow-estimator 2.10.0 tensorflow-io-gcs-filesystem 0.31.0 termcolor 2.3.0 threadpoolctl 3.1.0 tifffile 2023.4.12 timm 0.6.12 tokenizers 0.13.3 toml 0.10.2 toolz 0.12.0 torch 2.0.0+cu118 torchmetrics 0.11.4 torchvision 0.15.1+cu118 tqdm 4.65.0 transformers 4.26.0 typing_extensions 4.4.0 typing-inspect 0.8.0 tzdata 2023.3 urllib3 1.26.13 voluptuous 0.13.1 wandb 0.15.2 wcwidth 0.2.6 Werkzeug 2.3.4 wheel 0.40.0 wrapt 1.15.0 xformers 0.0.19 yarl 1.9.2 zipp 3.15.0

rockerBOO commented 1 year ago

what commit are you currently on of sd-scripts?

kf1111 commented 1 year ago

3b1af3f1a63b858af8c12662cbae70654229e327

kf1111 commented 1 year ago

The same issue occurs with the latest commit, c924c47f374ac1b6e33e71f82948eb1853e2243f

cian0 commented 1 year ago

Same here, has this been resolved?

axel578 commented 1 year ago

Same issue !

1099271 commented 1 year ago

Same issue !

rockerBOO commented 1 year ago

I am looking into this issue but if anyone having this issue can confirm any wandb warnings in their terminal/command/bat output?

kf1111 commented 1 year ago

Haven't used sd-scripts for a long time, but I have some old wandb logs that might help.

1 epoch 1/100 2 F:\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None 3 warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") 4 F:\sd-scripts\venv\lib\site-packages\xformers\ops\fmha\flash.py:338: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 5 and inp.query.storage().data_ptr() == inp.key.storage().data_ptr() 6 epoch 2/100 7 epoch 3/100 8 epoch 4/100 9 epoch 5/100 10 epoch 6/100 11 epoch 7/100 12 epoch 8/100 13 epoch 9/100 14 epoch 10/100

emcmanus commented 2 months ago

Per https://github.com/kohya-ss/sd-scripts/blob/f8f5b1695842cce15ba14e7edfacbeee41e71a75/train_network.py#L952

Metric logging (like loss) is only enabled when you provide a --logging_dir parameter.