xiaohang007 / SLICES

SLICES: An Invertible, Invariant, and String-based Crystal Representation [2023, Nature Communications]
https://www.nature.com/articles/s41467-023-42870-7
GNU Lesser General Public License v2.1
50 stars 7 forks source link

About the GPU-based hardware requirement to run this package. #11

Open hongyi-zhao opened 2 weeks ago

hongyi-zhao commented 2 weeks ago

Hi,

I wonder what's the GPU-based hardware requirement to run this package. Any tips will be helpful.

Regards, Zhao

xiaohang007 commented 2 weeks ago

Hi,

I wonder what's the GPU-based hardware requirement to run this package. Any tips will be helpful.

Regards, Zhao

For training MatterGPT models, a GPT-2 configuration with 12 layers, 12 attention heads, and a hidden size of 768, when trained on a bandgap dataset containing 280,000 data points, requires several hours of computation time using an NVIDIA RTX 4090 GPU. When trained on a small bandgap dataset (mp20_nonmetal in the jupyter notebook tutorial) with 13,000 data points, requires around 15 mins of computation time with 4090. However, for running SLICES itself to encode and decode crystal structures, only CPU is needed.

hongyi-zhao commented 2 weeks ago

Thank you for your comments by providing the experience in model training. BTW, based on my understanding and knowledge, the 4090 is a gaming card, not a computing card. On the other hand, according to benchmark data from a research group at Westlake University, the 4090 is far from being the most cost-effective configuration:

D9B580EAC044C1DB4774CB929C8F068C

xiaohang007 commented 2 weeks ago

Thank you for your comments by providing the experience in model training. BTW, based on my understanding and knowledge, the 4090 is a gaming card, not a computing card. On the other hand, according to benchmark data from a research group at Westlake University, the 4090 is far from being the most cost-effective configuration:

D9B580EAC044C1DB4774CB929C8F068C

Actually, some of the latest consumer-grade GPUs are surprisingly cost-effective for training large language models (with FP16 accuracy): Certain high-end consumer GPUs (e.g. 4090) offer competitive FP16 computation power compared to some professional-grade GPUs (e.g. A100) that can cost several times more. 未命名图片

未命名图片2
hongyi-zhao commented 2 weeks ago

The price of 4090 is still relatively high. I am currently considering the previously mentioned solution based on the NVIDIA TESLA V100 SXM2 8/4 card server as a configuration for the GPU machine, as shown below:

高性能计算HPC_DTR-GS系列产品规格书_Rev1.1.pdf 达腾瑞HPC报价单_客户版.xlsx.zip

xiaohang007 commented 2 weeks ago

The price of 4090 is still relatively high. I am currently considering the previously mentioned solution based on the NVIDIA TESLA V100 SXM2 8/4 card server as a configuration for the GPU machine, as shown below:

高性能计算HPC_DTR-GS系列产品规格书_Rev1.1.pdf 达腾瑞HPC报价单_客户版.xlsx.zip

For large scale MD calculations, v100 is indeed a better choice than 4090.

hongyi-zhao commented 2 weeks ago

According to the description in the official brochure above, this configuration is suitable for various scenarios and performs well. So far, I have not had a chance to test it.

image

hongyi-zhao commented 2 weeks ago

Actually, some of the latest consumer-grade GPUs are surprisingly cost-effective for training large language models (with FP16 accuracy): Certain high-end consumer GPUs (e.g. 4090) offer competitive FP16 computation power compared to some professional-grade GPUs (e.g. A100) that can cost several times more.

The series of products based on the Blackwell architecture also have similar issues: the FP32 and FP64 vector performance of a single B100 is 11% lower than that of the H100 SXM5, while a single B200 is 20% higher. It is estimated that the price of the B200 will be 3-4 times higher than that of the H100, making its cost-effectiveness in the field of scientific computing self-evident.

3a6b8e6860dacf03267e140c41211a94_ 7a03c64fabf6110b41327f57c33642ce_

xiaohang007 commented 2 weeks ago

Actually, some of the latest consumer-grade GPUs are surprisingly cost-effective for training large language models (with FP16 accuracy): Certain high-end consumer GPUs (e.g. 4090) offer competitive FP16 computation power compared to some professional-grade GPUs (e.g. A100) that can cost several times more.

The series of products based on the Blackwell architecture also have similar issues: the FP32 and FP64 vector performance of a single B100 is 11% lower than that of the H100 SXM5, while a single B200 is 20% higher. It is estimated that the price of the B200 will be 3-4 times higher than that of the H100, making its cost-effectiveness in the field of scientific computing self-evident.

3a6b8e6860dacf03267e140c41211a94_ 7a03c64fabf6110b41327f57c33642ce_

Thank a lot for the valuable info. :)

hongyi-zhao commented 2 weeks ago

The fundamental reason for this phenomenon lies in profit motives: High-end chip investments are more concentrated in the commercial sector rather than in HPC scientific computing. HPC scientific computing is relatively a niche field, as scientific research is an activity that only a very small number of people can engage in, and its actual output is unpredictable.

For companies like NVIDIA, if they truly want to support HPC, they only need to design specialized high-end HPC computing cards for specific world-class research institutions. This strategy is clearly the most wise and high-return.

This is exactly like the situation we see today with CPUs used in most research institutions: the so-called high-end CPUs used in the field of scientific research are often products discarded by major commercial companies.

See [GPU加速] 锐评Blackwell GPU for the related discussion.