nv-tlabs / LION

Latent Point Diffusion Models for 3D Shape Generation
Other
755 stars 60 forks source link

training VAE #23

Closed MariemOualha closed 1 year ago

MariemOualha commented 1 year ago

hello @ZENGXH , i'm trying to train VAE but i got errors when i run the train_vae.sh can you help me ! Capture

ZENGXH commented 1 year ago

Hi, I think there is a $ symbol missing in line 13 somehow. When there is a $ symbol, it will take the defined environment variable. Here ENT stands for python train_dist.py --num_process_per_node $NGPU as in the original file.

If this is not clear, please post your full train_vae.sh here.

albertotono commented 1 year ago

@ZENGXH How long did you train it? till epoch 8000?

ZENGXH commented 1 year ago

@ZENGXH How long did you train it? till epoch 8000?

on 4 A100, it takes ~10 hours to train on airplane category for 8000 epochs, batch-size 32 per gpu.

albertotono commented 1 year ago

@MariemOualha for reference https://github.com/nv-tlabs/LION/issues/31