mockingbirdnest / Principia

𝑛-Body and Extended Body Gravitation for Kerbal Space Program
MIT License
746 stars 67 forks source link

Use polynomial evaluators to compute the polynomials for sin and cos #4018

Closed pleroy closed 1 month ago

pleroy commented 1 month ago

Note that FMA decreases both the latency (good) and the throughput (bad) a bit.

Run on (48 X 3793 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 512 KiB (x24)
  L3 Unified 32768 KiB (x4)
---------------------------------------------------------------------------------------------------------
Benchmark                                                               Time             CPU   Iterations
---------------------------------------------------------------------------------------------------------
BM_ExperimentSinTableSpacing<Metric::Latency, 2.0 / 256.0>           11.6 ns         11.7 ns     64000000
BM_ExperimentSinTableSpacing<Metric::Throughput, 2.0 / 256.0>        2.59 ns         2.62 ns    280000000
BM_ExperimentSinTableSpacing<Metric::Latency, 2.0 / 1024.0>          11.1 ns         11.2 ns     64000000
BM_ExperimentSinTableSpacing<Metric::Throughput, 2.0 / 1024.0>       2.47 ns         2.46 ns    298667000
BM_ExperimentCosTableSpacing<Metric::Latency, 2.0 / 256.0>           11.4 ns         11.2 ns     56000000
BM_ExperimentCosTableSpacing<Metric::Throughput, 2.0 / 256.0>        2.56 ns         2.57 ns    280000000
BM_ExperimentCosTableSpacing<Metric::Latency, 2.0 / 1024.0>          11.1 ns         11.0 ns     64000000
BM_ExperimentCosTableSpacing<Metric::Throughput, 2.0 / 1024.0>       2.39 ns         2.41 ns    298667000
BM_ExperimentSinMultiTable<Metric::Latency>                          12.1 ns         12.0 ns     56000000
BM_ExperimentSinMultiTable<Metric::Throughput>                       3.54 ns         3.53 ns    203637000
BM_ExperimentCosMultiTable<Metric::Latency>                          12.1 ns         12.0 ns     56000000
BM_ExperimentCosMultiTable<Metric::Throughput>                       3.49 ns         3.53 ns    203637000
BM_ExperimentSinSingleTable<Metric::Latency>                         11.7 ns         11.7 ns     64000000
BM_ExperimentSinSingleTable<Metric::Throughput>                      2.94 ns         2.92 ns    235790000
BM_ExperimentCosSingleTable<Metric::Latency>                         11.8 ns         11.7 ns     56000000
BM_ExperimentCosSingleTable<Metric::Throughput>                      3.01 ns         3.00 ns    224000000

1760.