Closed pleroy closed 1 month ago
Note that FMA decreases both the latency (good) and the throughput (bad) a bit.
Run on (48 X 3793 MHz CPU s) CPU Caches: L1 Data 32 KiB (x24) L1 Instruction 32 KiB (x24) L2 Unified 512 KiB (x24) L3 Unified 32768 KiB (x4) --------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------- BM_ExperimentSinTableSpacing<Metric::Latency, 2.0 / 256.0> 11.6 ns 11.7 ns 64000000 BM_ExperimentSinTableSpacing<Metric::Throughput, 2.0 / 256.0> 2.59 ns 2.62 ns 280000000 BM_ExperimentSinTableSpacing<Metric::Latency, 2.0 / 1024.0> 11.1 ns 11.2 ns 64000000 BM_ExperimentSinTableSpacing<Metric::Throughput, 2.0 / 1024.0> 2.47 ns 2.46 ns 298667000 BM_ExperimentCosTableSpacing<Metric::Latency, 2.0 / 256.0> 11.4 ns 11.2 ns 56000000 BM_ExperimentCosTableSpacing<Metric::Throughput, 2.0 / 256.0> 2.56 ns 2.57 ns 280000000 BM_ExperimentCosTableSpacing<Metric::Latency, 2.0 / 1024.0> 11.1 ns 11.0 ns 64000000 BM_ExperimentCosTableSpacing<Metric::Throughput, 2.0 / 1024.0> 2.39 ns 2.41 ns 298667000 BM_ExperimentSinMultiTable<Metric::Latency> 12.1 ns 12.0 ns 56000000 BM_ExperimentSinMultiTable<Metric::Throughput> 3.54 ns 3.53 ns 203637000 BM_ExperimentCosMultiTable<Metric::Latency> 12.1 ns 12.0 ns 56000000 BM_ExperimentCosMultiTable<Metric::Throughput> 3.49 ns 3.53 ns 203637000 BM_ExperimentSinSingleTable<Metric::Latency> 11.7 ns 11.7 ns 64000000 BM_ExperimentSinSingleTable<Metric::Throughput> 2.94 ns 2.92 ns 235790000 BM_ExperimentCosSingleTable<Metric::Latency> 11.8 ns 11.7 ns 56000000 BM_ExperimentCosSingleTable<Metric::Throughput> 3.01 ns 3.00 ns 224000000
Note that FMA decreases both the latency (good) and the throughput (bad) a bit.
1760.