Open hongyi-zhao opened 2 weeks ago
Hi,
I wonder what's the GPU-based hardware requirement to run this package. Any tips will be helpful.
Regards, Zhao
For training MatterGPT models, a GPT-2 configuration with 12 layers, 12 attention heads, and a hidden size of 768, when trained on a bandgap dataset containing 280,000 data points, requires several hours of computation time using an NVIDIA RTX 4090 GPU. When trained on a small bandgap dataset (mp20_nonmetal in the jupyter notebook tutorial) with 13,000 data points, requires around 15 mins of computation time with 4090. However, for running SLICES itself to encode and decode crystal structures, only CPU is needed.
Thank you for your comments by providing the experience in model training. BTW, based on my understanding and knowledge, the 4090 is a gaming card, not a computing card. On the other hand, according to benchmark data from a research group at Westlake University, the 4090 is far from being the most cost-effective configuration:
Thank you for your comments by providing the experience in model training. BTW, based on my understanding and knowledge, the 4090 is a gaming card, not a computing card. On the other hand, according to benchmark data from a research group at Westlake University, the 4090 is far from being the most cost-effective configuration:
Actually, some of the latest consumer-grade GPUs are surprisingly cost-effective for training large language models (with FP16 accuracy): Certain high-end consumer GPUs (e.g. 4090) offer competitive FP16 computation power compared to some professional-grade GPUs (e.g. A100) that can cost several times more.
The price of 4090 is still relatively high. I am currently considering the previously mentioned solution based on the NVIDIA TESLA V100 SXM2 8/4 card server as a configuration for the GPU machine, as shown below:
The price of 4090 is still relatively high. I am currently considering the previously mentioned solution based on the NVIDIA TESLA V100 SXM2 8/4 card server as a configuration for the GPU machine, as shown below:
For large scale MD calculations, v100 is indeed a better choice than 4090.
According to the description in the official brochure above, this configuration is suitable for various scenarios and performs well. So far, I have not had a chance to test it.
Actually, some of the latest consumer-grade GPUs are surprisingly cost-effective for training large language models (with FP16 accuracy): Certain high-end consumer GPUs (e.g. 4090) offer competitive FP16 computation power compared to some professional-grade GPUs (e.g. A100) that can cost several times more.
The series of products based on the Blackwell architecture also have similar issues: the FP32 and FP64 vector performance of a single B100 is 11% lower than that of the H100 SXM5, while a single B200 is 20% higher. It is estimated that the price of the B200 will be 3-4 times higher than that of the H100, making its cost-effectiveness in the field of scientific computing self-evident.
Actually, some of the latest consumer-grade GPUs are surprisingly cost-effective for training large language models (with FP16 accuracy): Certain high-end consumer GPUs (e.g. 4090) offer competitive FP16 computation power compared to some professional-grade GPUs (e.g. A100) that can cost several times more.
The series of products based on the Blackwell architecture also have similar issues: the FP32 and FP64 vector performance of a single B100 is 11% lower than that of the H100 SXM5, while a single B200 is 20% higher. It is estimated that the price of the B200 will be 3-4 times higher than that of the H100, making its cost-effectiveness in the field of scientific computing self-evident.
Thank a lot for the valuable info. :)
The fundamental reason for this phenomenon lies in profit motives: High-end chip investments are more concentrated in the commercial sector rather than in HPC scientific computing. HPC scientific computing is relatively a niche field, as scientific research is an activity that only a very small number of people can engage in, and its actual output is unpredictable.
For companies like NVIDIA, if they truly want to support HPC, they only need to design specialized high-end HPC computing cards for specific world-class research institutions. This strategy is clearly the most wise and high-return.
This is exactly like the situation we see today with CPUs used in most research institutions: the so-called high-end CPUs used in the field of scientific research are often products discarded by major commercial companies.
See [GPU加速] 锐评Blackwell GPU for the related discussion.
Hi,
I wonder what's the GPU-based hardware requirement to run this package. Any tips will be helpful.
Regards, Zhao