Closed fiona-lxd closed 3 months ago
Hi,
Sorry for the late reply, i think training for all 200 epochs might not be needed. As i had access to 4 gpus i had trained for 200 epochs which i believe took about 24-48 hrs.
Increasing the learning rate, reducing the batchsize (so that it reduces the accumulated gradients) and reducing the range
of truncated_backprop_minmax
from (0,50) to (49,50) would be clear ways of reducing the compute time.
I have also opensourced the checkpoints for HPS and Aesthetic if helpful.
I found that training with this code is quick for the aesthetic (about 3hours on one A100). But it seems that it would cost several days for HPS. I started with command CUDA_VISIBLE_DEVICES=0 python main.py --config config/align_prop.py:hps. It cost half of the day for 5 epochs while the total epoch is 200. I know this is caused by the dataset sizes (the number of HPS prompts is 752 while it is 45 for aesthetic). But is there any suggestions on fastening?