Closed nbardy closed 1 year ago
Thanks for all the great work.
I'm happy to take the distributed stuff from here. Was hoping to have a distributed run going today on the cluster, but just got a single chip running I have a couple different training scripts on my fork one of them uses ray and accelerate.
Just got a webdataset script working with the upsampler on the TPU chip. Was surprisingly a pain debugging webdataset pipe errors and setting up credentials.
@nbardy yea no problem, i know how it is. things are never straightforward in software
@CerebralSeed pull requested the sampler script and validated that the upsampler works! that should unblock you for your work
i'm going to give accelerate integration (sans ray, since i'm not familiar with it) a try today
Learning on the accelerated chips finally! Remarkably good results for 40 steps in. Last time I trained a GAN was a very long time ago.
Losses look stable.
Looking at the XLA docs trying to figure out what the best way to network this is with tpus. Might just drop ray 🤔 already checkpointing and tracking runs with WB.
https://wandb.ai/nbardy-facet/gigagan/runs/zv9004dr?workspace=user-nbardy-facet
Got started on XMP today. It’s getting stuck on step 1 . Most likely more device errors
Accelerate was giving bad crashes. Probably incompatible.
I will talk more with Google tomorrow. They will mostly likely be able to help me sort this out end of day tomorrow.
@nbardy good to see some progress on your end!
for me, i was stuck on a bug in the base generator architecture, but finally got it working before bedtime
i'm going to wire up accelerate this morning (this time for real lol) and try out that vision aided discriminator loss
Training across 16 chips with XLA/XMP.
Logs(Currently very slow because XLA is compiling the first steps and debug mode is on)
And they all crash at 30 minutes :(
And they all crash at 30 minutes :(
haha yea, expected this to be not that mature
they are basically exchanging free compute for free QA
today was much smoother sailing for me; accelerate and mixed precision is working for multi-gpu on my one machine!
Hi Phil,
I have been using your implementation and noted that subpixel upsampling is giving me a lower generative performance.
It is introducing checkerboard artifacts that negatively affect the quality of the generated images. To address this, I have experimented with replacing subpixel convolution with Bilinear Upsampling, and it has yielded better results.
Also, the StyleGAN generator relies on maintaining unit variance for its feature activations for effective style mixing. It is unclear if the subpixel upsampling still leads to activations that are unit variance.
Hi Phil,
I have been using your implementation and noted that subpixel upsampling is giving me a lower generative performance.
It is introducing checkerboard artifacts that negatively affect the quality of the generated images. To address this, I have experimented with replacing subpixel convolution with Bilinear Upsampling, and it has yielded better results.
Also, the StyleGAN generator relies on maintaining unit variance for its feature activations for effective style mixing. It is unclear if the subpixel upsampling still leads to activations that are unit variance.
hey yup! i was actually going to offer this as an option as i noticed the same
defaulted it to bilinear upsample for now, controllable with this option
@randintgenr are you a computer vision researcher?
almost done with the entire training code
ok, i think it is done, save for a few edge cases and cleanup
going to wind down work on this repo next week and move back to video gen
closing, as code is there, and I know of a group moving forward with training already
Hey @lucidrains, have you heard anything about a timeline for the group that's currently training GigaGAN? I'd appreciate any information you have. Thank you!
@anandbhattad yea they have proceeded, but this group will not be doing it open sourced
@lucidrains, I appreciate your response. I was wondering if you knew the necessary computing power for training on the LIAON-5B dataset. The paper lacks clear information on compute and time requirements for training the model (Table A2 is ambiguous). As I only have academic compute access, I am interested in exploring whether GigaGAN utilizes familiar rendering elements such as normals and depth like we demonstrated in StyleGAN-2. Here's the link for more information: https://arxiv.org/abs/2306.00987
@nbardy would greatly appreciate if you're able to share what image size and other settings you use, if you get anything that works at a size larger than 128px. TIA
@lucidrains I'm pretty sure that group is this one: https://magnific.ai/
Or at least it seems so. If I had money and anything more than 24 GB VRAM I will train this but is impossible for me, haha.
@nbardy Hi Nicholas! Do you still plan to train this model on LAION, or have any updates regarding it?
I've got a bunch of compute the next couple weeks and thinking to train this on LAION.
Wondering if there is any other training going on right now. Would hate to duplicate efforts too much.