zideliu / StyleDrop-PyTorch

Unoffical implement for [StyleDrop](https://arxiv.org/abs/2306.00983)
MIT License
567 stars 27 forks source link

OOM at eval running with `slow compile` warning #24

Closed parnurzeal closed 11 months ago

parnurzeal commented 11 months ago

Hi,

I am able to run on GPU A100 with the training command as you suggested:

#!/bin/bash
unset EVAL_CKPT
unset ADAPTER
export OUTPUT_DIR="output_dir/for/this/experiment"
accelerate launch --num_processes 8 --mixed_precision fp16 train_t2i_custom_v2.py --config=configs/custom.py

This is w/o xformers installed.

The training part runs fine but then it starts to do the eval. It becomes super slow and seems not using GPU anymore and only running on CPUs. This causes it to be very slow and consume a lot of memory and ended up OOM and crashed.

The only warning I got is the following:

2022-06-27 14:06:50.915791: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]

Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_train_step.503

I would be really appreciated if you could advice how I can resolve this issue.

parnurzeal commented 11 months ago

Nvm. W/ an upgrade to dependencies. It works now :)