[ ] Generate data and graph it, rather than generating large difficult-to-interpret tables.
[ ] Generate data for all functions.
[ ] Allow generating data for specific functions.
[ ] Generate graphs of relative performance for related functions (e.g. How much faster is squaring to multiplication for various sizes?)
[ ] For each operation, work out the appropriate amount of data to use (currently we use a tiny amount for small fixnums and a largish amount for bigger fixnums -- this is inconsistent).
[ ] Understand what the optimal performance could be in important cases (e.g. mulmod, modexp).
[ ] Normalise the data gathered relative to some appropriately chosen device parameters (e.g. #SMs, #cores, core frequency, ...)
From https://github.com/data61/cuda-fixnum/issues/50: