Open kf1111 opened 1 year ago
Just tried it and its recording for me
Could there be a problem with my Python environment?
Package Version
absl-py 1.4.0 accelerate 0.15.0 aiohttp 3.8.4 aiosignal 1.3.1 albumentations 1.3.0 altair 4.2.2 appdirs 1.4.4 astunparse 1.6.3 async-timeout 4.0.2 attrs 23.1.0 bitsandbytes 0.38.1 cachetools 5.3.0 certifi 2022.12.7 charset-normalizer 2.1.1 click 8.1.3 colorama 0.4.6 diffusers 0.10.2 docker-pycreds 0.4.0 easygui 0.98.3 einops 0.6.0 entrypoints 0.4 fairscale 0.4.13 filelock 3.9.0 flatbuffers 23.5.8 frozenlist 1.3.3 fsspec 2023.5.0 ftfy 6.1.1 gast 0.4.0 gitdb 4.0.10 GitPython 3.1.31 google-auth 2.18.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.54.0 h5py 3.8.0 huggingface-hub 0.13.3 idna 3.4 imageio 2.28.1 importlib-metadata 6.6.0 Jinja2 3.1.2 joblib 1.2.0 jsonschema 4.17.3 keras 2.10.0 Keras-Preprocessing 1.1.2 lazy_loader 0.2 libclang 16.0.0 library 0.0.0 lightning-utilities 0.8.0 Markdown 3.4.3 MarkupSafe 2.1.2 mpmath 1.2.1 multidict 6.0.4 mypy-extensions 1.0.0 networkx 3.0 numpy 1.24.1 oauthlib 3.2.2 opencv-python 4.7.0.68 opencv-python-headless 4.7.0.72 opt-einsum 3.3.0 packaging 23.1 pandas 2.0.1 pathtools 0.1.2 Pillow 9.3.0 pip 23.0.1 protobuf 3.19.6 psutil 5.9.5 pyasn1 0.5.0 pyasn1-modules 0.3.0 pyre-extensions 0.0.29 pyrsistent 0.19.3 python-dateutil 2.8.2 pytorch-lightning 1.9.0 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 qudida 0.0.4 regex 2023.5.5 requests 2.28.1 requests-oauthlib 1.3.1 rsa 4.9 safetensors 0.2.6 scikit-image 0.20.0 scikit-learn 1.2.2 scipy 1.10.1 sentry-sdk 1.22.2 setproctitle 1.3.2 setuptools 65.5.0 six 1.16.0 smmap 5.0.0 sympy 1.11.1 tensorboard 2.10.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.10.1 tensorflow-estimator 2.10.0 tensorflow-io-gcs-filesystem 0.31.0 termcolor 2.3.0 threadpoolctl 3.1.0 tifffile 2023.4.12 timm 0.6.12 tokenizers 0.13.3 toml 0.10.2 toolz 0.12.0 torch 2.0.0+cu118 torchmetrics 0.11.4 torchvision 0.15.1+cu118 tqdm 4.65.0 transformers 4.26.0 typing_extensions 4.4.0 typing-inspect 0.8.0 tzdata 2023.3 urllib3 1.26.13 voluptuous 0.13.1 wandb 0.15.2 wcwidth 0.2.6 Werkzeug 2.3.4 wheel 0.40.0 wrapt 1.15.0 xformers 0.0.19 yarl 1.9.2 zipp 3.15.0
what commit are you currently on of sd-scripts?
3b1af3f1a63b858af8c12662cbae70654229e327
The same issue occurs with the latest commit, c924c47f374ac1b6e33e71f82948eb1853e2243f
Same here, has this been resolved?
Same issue !
Same issue !
I am looking into this issue but if anyone having this issue can confirm any wandb warnings in their terminal/command/bat output?
Haven't used sd-scripts for a long time, but I have some old wandb logs that might help.
1 epoch 1/100 2 F:\sd-scripts\venv\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None 3 warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") 4 F:\sd-scripts\venv\lib\site-packages\xformers\ops\fmha\flash.py:338: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 5 and inp.query.storage().data_ptr() == inp.key.storage().data_ptr() 6 epoch 2/100 7 epoch 3/100 8 epoch 4/100 9 epoch 5/100 10 epoch 6/100 11 epoch 7/100 12 epoch 8/100 13 epoch 9/100 14 epoch 10/100
Metric logging (like loss) is only enabled when you provide a --logging_dir
parameter.
I attempted to record the loss and learning rate of my lora learning, but only GPU information was recorded. My config.toml file contains the following settings:
log_with = "wandb" log_tracker_name = "lora_0511" wandb_api_key = "apikey"
pretrained_model_name_or_path = "....ckpt" train_data_dir = "..."
shuffle_caption = true caption_extension = ".txt" keep_tokens = 20 resolution = "768" vae_batch_size = 4 enable_bucket = true output_dir = "..." output_name = "..." save_precision = "fp16" save_every_n_epochs = 10
train_batch_size = 2 gradient_checkpointing = true gradient_accumulation_steps = 64
max_token_length = 150 xformers = true max_train_epochs = 50 persistent_data_loader_workers = true seed = 42 mixed_precision = "bf16" clip_skip = 2
multires_noise_iterations = 6 multires_noise_discount = 0.1
flip_aug = true use_8bit_adam = true lr_scheduler = "cosine_with_restarts" lr_warmup_steps = 12 lr_scheduler_num_cycles = 10 unet_lr = 0.0004 text_encoder_lr = 0.0002 network_module = "networks.lora" network_dim = 64 network_alpha = 32.0
https://github.com/kohya-ss/sd-scripts/pull/428 I read this page, and know it's ok to ignore "logging_dir"