Closed haofanwang closed 2 years ago
@haofanwang have you solved this problem? I meet the same problem
@lisiyao21 @aleeyang
Are you using multiple GPUs for training?
Maybe specifying the GPU index would help
CUDA_VISIBLE_DEVICES=0 python -u main.py --config configs/sep_vqvae.yaml --train
I meet error at Step 1 by running
python -u main.py --config configs/sep_vqvae.yaml --train
After print the loss, it looks like
tensor([0.2667, 0.2735, 0.2687, 0.2584, 0.2701, 0.2697, 0.2571, 0.2658], device='cuda:0', grad_fn=<GatherBackward>)
, so do I need to take a mean or sum operation?However, even if I take a mean operation, the training still seems problematic. The loss decreases normally, while in eval stage, the output quants are all zero. Any suggestion?
The training log is attached for reference.
log.txt
@lisiyao21