Reachitect base model to make faster training / use less ram

johndpope commented 5 months ago

so slow to train...

johndpope commented 5 months ago

Epoch [1/200000] completed in 269.48 seconds Screenshot from 2024-05-30 14-50-19

johndpope commented 5 months ago

on 3090 taking 275 seconds * 200,000 = 600 days to train model.

hazard-10 commented 5 months ago

so slow to train...

Are you training on 256 or 512 resolution ? Did you use the complete set of voxceleb or filtered the clips by quality like original paper proposed ?

johndpope commented 5 months ago

the training code is very similar to paper. Im training - 512 x 512 - the training is happening on 2 videos at the moment - using the overfitting config.

hazard-10 commented 5 months ago

the training code is very similar to paper. Im training - 512 x 512 - the training is happening on 2 videos at the moment - using the overfitting config.

Only on two videos, is it for sanity check ?

hazard-10 commented 5 months ago

I downloaed and cropped about 9k 512x512 clips from voxceleb2. Wondering how much compute required to run 200k epochs on those.

johndpope commented 5 months ago

2 years on a 3090.= 2 months on single h100

https://github.com/johndpope/VASA-1-hack/issues/5

The NVIDIA RTX 3090, A100, and H100 are different generations and categories of GPUs, each with distinct performance characteristics. Here’s a comparison to give you a better understanding of how they differ and what kind of performance improvement you might expect over two years with a 3090 compared to an A100 or H100.

NVIDIA RTX 3090 Architecture: Ampere Release Year: 2020 CUDA Cores: 10,496 Memory: 24 GB GDDR6X Memory Bandwidth: 936 GB/s FP32 Performance: 35.6 TFLOPS Target Use: High-end consumer gaming, some professional AI/deep learning workloads. NVIDIA A100 Architecture: Ampere Release Year: 2020 CUDA Cores: 6,912 (per chip; A100 is available in multi-chip configurations) Memory: 40 GB or 80 GB HBM2e Memory Bandwidth: 1,555 GB/s (40 GB) or 2,039 GB/s (80 GB) FP32 Performance: 19.5 TFLOPS (Single A100); with mixed precision, up to 312 TFLOPS Target Use: Professional AI/deep learning, data analytics, HPC. NVIDIA H100 Architecture: Hopper Release Year: 2022 CUDA Cores: 16,896 Memory: 80 GB HBM3 Memory Bandwidth: 3,200 GB/s FP32 Performance: 60 TFLOPS (Single H100); with mixed precision, up to 1,000 TFLOPS Target Use: Cutting-edge AI/deep learning, data analytics, HPC. Performance Comparison Raw Compute Power: The H100 is significantly more powerful than the RTX 3090 and even the A100 in terms of raw compute power, especially in mixed precision operations which are common in AI and deep learning. Memory Bandwidth: The H100's memory bandwidth is more than three times that of the RTX 3090 and significantly higher than the A100. Architecture Improvements: The Hopper architecture in the H100 introduces several optimizations and new features specifically targeted at accelerating AI workloads. Expected Training Time Improvement While it’s difficult to provide exact numbers without specific benchmarks, here's a general expectation:

A100 vs. RTX 3090: You can expect the A100 to be 3-4 times faster than the RTX 3090 for deep learning workloads due to its architecture, higher memory bandwidth, and optimizations for AI. H100 vs. RTX 3090: The H100 might be 5-10 times faster than the RTX 3090, again depending on the workload, because of its significantly higher compute power and memory bandwidth, along with architectural improvements. Practical Consideration If your model training takes 2 years on an RTX 3090, switching to an A100 could potentially reduce the training time to 6-8 months. With an H100, you might reduce it further to approximately 2-4 months.

Conclusion Upgrading from an RTX 3090 to an A100 or H100 will result in substantial reductions in training time, with the H100 offering the most significant improvement. If you need precise numbers, it's best to look for specific benchmarks or perform your own tests on these GPUs.

2 epochs - i maybe run it over the weekend.... hopefully doesn't cook gpu. Screenshot from 2024-05-30 16-26-30

johndpope commented 2 months ago

waiting on emoportrait

johndpope / MegaPortrait-hack

Reachitect base model to make faster training / use less ram #26