openmm / openmm-org

Content of https://openmm.org
Other
4 stars 12 forks source link

Clarify exactly which models of GPUs were used in benchmarks #87

Open jchodera opened 2 years ago

jchodera commented 2 years ago

There seems to be significant variation in the performance of different models/variants of the same GPU (e.g. the multiple variants of A100 available), so we should provide more details in our benchmarks about exactly which model(s) were used.

peastman commented 2 years ago

The A100s are on Perlmutter. They're 40 GB, 1410 MHz versions.

jchodera commented 2 years ago

Maybe we should capture the output of nivida-smi -q?

The datasheet says there's a bunch of flavors of A100: image

peastman commented 2 years ago

The only difference between them is the amount of memory (40 or 80 GB) and the form factor (PCIe or SXM). Neither of those should have any difference in speed.

Here's what nvidia-smi reports on the login node with the GPU idle.

    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 1215 MHz
        Video                             : 585 MHz
    Applications Clocks
        Graphics                          : 765 MHz
        Memory                            : 1215 MHz
    Default Applications Clocks
        Graphics                          : 765 MHz
        Memory                            : 1215 MHz
    Max Clocks
        Graphics                          : 1410 MHz
        SM                                : 1410 MHz
        Memory                            : 1215 MHz
        Video                             : 1290 MHz
    Max Customer Boost Clocks
        Graphics                          : 1410 MHz

Comparing to what you posted in https://github.com/openmm/openmm-org/pull/86#issuecomment-1007171890, the max clock rates for graphics, SM, and video are the same, but the memory is slightly lower. Other factors that can affect performance are the type of bus (PCIe or NVLink, and the particular version of either one), the cooling system (influences whether it can actually sustain the maximum clock rate, bus topology (mainly for multi-GPU benchmarks), and CPU type (it's not a huge effect for GPU benchmarks, but it does make a difference).