nim-lang / RFCs

A repository for your Nim proposals.
136 stars 23 forks source link

Noinline and/or Addressable consts #257

Open mratsim opened 4 years ago

mratsim commented 4 years ago

This feature is similar to the discussion here: https://github.com/nim-lang/Nim/issues/12216#issuecomment-533617340 which was for original use-case https://github.com/nim-lang/Nim/issues/12216#issue-495529376

However the solution merged in https://github.com/nim-lang/Nim/commit/801fbe7e948ee786ceeb497ecad97607a21a663c and https://github.com/nim-lang/Nim/pull/12870/files doesn't address the following use cases:

Overview

  1. Low-level programs often work with look-up tables and some may work with very large constants. This is the case for high performance computing, for example:

    MantissaBits = 23'i32 MantissaBitsMask = 1'i32 shl 23

    ln2 = ln(2'f32)

    ExpMax = 88'i32 ExpMin = -88'i32 ExpA = ExpBitsMask.float32 / ln2 ExpB = ln2 / ExpBitsMask.float32

func initExpLUT(): array[ExpBitsMask, int32] = for i, val in result.mpairs: let y = pow(2'f32, i.float32 / ExpBitsMask.float32) val = castint32 and (MantissaBitsMask - 1)

We need ExpLUT in the BSS so that we can take it's address so it can't be const

let ExpLUT* = initExpLUT()


- For cryptography I use big integers of large size and may also use look-up tables of size up to 32KB https://github.com/catid/snowshoe/blob/8ba3f575/src/recode.inc#L319-L361
  For example, currently my library Constantine has already large:
  - Prime moduli that are used in every files: https://github.com/mratsim/constantine/blob/51586c72/constantine/config/curves_declaration.nim#L125-L145
  - constants (4300+ bits numbers): https://github.com/mratsim/constantine/blob/51586c72/constantine/curves/bls12_381_pairing.nim#L26
  - Lattices: https://github.com/mratsim/constantine/blob/51586c72/constantine/curves/bls12_381_glv.nim#L16-L69
  - 20+ large coefficients per elliptic curves: https://github.com/mratsim/constantine/blob/51586c72/constantine/curves/bls12_381_glv.nim#L16-L69

2. Currently consts in particular const arrays are copied in each compilation unit that access them at runtime

3. Const are considered side-effect-free for the purpose of `func`, neither `let` or `let {.compiletime.}` are

4. Both scientific computing, big integer and cryptographic libraries benefit from this side-effect analysis, for cryptography it's actually one of the best features of Nim regarding code quality insurance.

5. The duplication of consts may greatly increase code size, this is problematic in resource constraint devices which actually benefit the most from LUT to accelerate operations. They also require strong cryptography hence larger numbers/arrays to communicate securely. Another target that is concerned with code size is WASM which is growingly popular for cryptography (https://community.zkproof.org/t/zksnarks-in-webassembly-running-demo-and-discussion/30)

6. The duplication also impacts data eviction/hotness from the CPU cache, as an example if we have
- Module "definitions.nim" holds const primes big integer for cryptography
- Module "bigints.nim" defines bigints procedure, and has to inline some const primes
- Module "protocols.nim" defines some protocols, it also has to inline some const primes and it will pass a pointer to its own inlined consts while calling "bigints.nim" even though that module already has the data.

## Proposal

I'd like a way to enforce that some consts are compiled into `extern const MyType MyVar` instead of being compiled to `static NIM_CONST MyType MyVar` in each compilation unit.
This should also retain their "side-effect-free" characteristic and the fact that they can still be used in the VM.

### Summary
In summary a solution to be considered must have the following characteristics:
1. Ensure no duplication in the codegen of consts
2. Accessing the const should be possible in a `func` or `{.noSideEffect.}` pragma
3. Accessing the const must be possible in the VM

This is out-of-scope though related and likely touches the same codepath
4. The address of the const can be taken at runtime.

### Potential solutions

1. Make the `{.noinline.}` pragma applicable to const. This would directly compile to `extern const`
2. Create a new pragma `{.addressable.}` for consts. This is augmenting https://github.com/nim-lang/Nim/commit/801fbe7e948ee786ceeb497ecad97607a21a663c by allowing const "progmem" variable to be considered side-effect free and usable in the VM.
3. Have `let MyVar {.compileTime.}` be considered side-effect free and produce `extern const`

### Opinion

I am not fan of solution 3 as I was already opposed to have `{.compileTime.}` be able to produce any runtime code.

Solution 1 covers my use-case for cryptography but not for LUT table for scientific computing.
For scientific computing, LUT loading is often done via SIMD which require an address, hence solution 2 is required.
Araq commented 4 years ago

What about this potential solution:

mratsim commented 4 years ago

This is OK for me too in principle. Regarding the size of N, I expect it's more platform dependent so if we use an heuristic it should properly handle AVR for example.

For cryptography, we have at minimum elements of size 32 bytes (256-bit) due to security constraints and even in that case, inlining isn't useful because I would take a pointer to those and pass them to an assembly procedure since the C compiler wouldn't be able to produce decent code anyway so duplicating 32 bytes is useless.

Hence, this can be a good default but some applications might require an escape hatch.

Araq commented 4 years ago

I checked the produced code and it does produce extern NIM_CONST ... very much like we want it to work. The compiler also has the notion of "dontInlineConstant". The only thing that should be done is to allow unsafeAddr for symbolic constants.

mratsim commented 4 years ago

Way to reproduce with Constantine:

git clone https://github.com/mratsim/constantine
cd constantine

# Either (Linux, Windows, Mac with Xcode from Catalina 10.15.2)
nimble bench_pairing_bls12_381
# Or (older Mac)
nimble bench_pairing_bls12_381_clang_noasm

# This will create a nimcache/bench_pairing_bls12_381* folder
# Then grep the lowest significant limb of the prime modulus
grep 13402431016077863595ULL nimcache/bench_pairing_bls12_381*/*

image

$ grep 13402431016077863595ULL nimcache/bench_pairing_bls12_381*/*
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@scurves@sbls12_381_inversion.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__wujVZ14MxHaQnGHXA9bkQXA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@selliptic@sec_shortweierstrass_affine.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__xbdALFoMj9aRAmPLgdaBTGA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@selliptic@sec_shortweierstrass_projective.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__9c8KUxqg9atExBrU4DyLZZFw_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@spairing@scyclotomic_fp12.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__RRiXpC4aKp7oJioYBjKrAw_3 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@spairing@slines_projective.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__BFm7A8UjJ4SisXUKjwk0ng_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@spairing@smul_fp12_by_lines.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__O2K6KMm6ba2GGMh1uJw9bVg_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@spairing@spairing_bls12.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__HcXop4jf09by36BOmOuV9cUA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@stower_field_extensions@scubic_extensions.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__PvnDy6QYOtiZAKPZlZu0Sg_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@stower_field_extensions@sexponentiations.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__YMcgH5JNZ7YDO2jls3j6GQ_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@stower_field_extensions@squadratic_extensions.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__SjayJ9bxAfdYuWY22RzUJ5g_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@sconstantine@stower_field_extensions@stower_common.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__HPjMPMGA9cEi73NVS3uz0DA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381__useASM/@m..@shelpers@sprng_unsafe.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__uCMRYreJLUF8qzvamp8QIg_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@scurves@sbls12_381_inversion.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__wujVZ14MxHaQnGHXA9bkQXA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@selliptic@sec_shortweierstrass_affine.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__xbdALFoMj9aRAmPLgdaBTGA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@selliptic@sec_shortweierstrass_projective.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__9c8KUxqg9atExBrU4DyLZZFw_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@spairing@scyclotomic_fp12.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__RRiXpC4aKp7oJioYBjKrAw_3 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@spairing@slines_projective.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__BFm7A8UjJ4SisXUKjwk0ng_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@spairing@smul_fp12_by_lines.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__O2K6KMm6ba2GGMh1uJw9bVg_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@spairing@spairing_bls12.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__HcXop4jf09by36BOmOuV9cUA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@stower_field_extensions@scubic_extensions.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__PvnDy6QYOtiZAKPZlZu0Sg_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@stower_field_extensions@sexponentiations.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__YMcgH5JNZ7YDO2jls3j6GQ_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@stower_field_extensions@squadratic_extensions.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__SjayJ9bxAfdYuWY22RzUJ5g_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@sconstantine@stower_field_extensions@stower_common.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__HPjMPMGA9cEi73NVS3uz0DA_2 = {{13402431016077863595ULL,
nimcache/bench_pairing_bls12_381_clang_noASM/@m..@shelpers@sprng_unsafe.nim.c:static NIM_CONST tyObject_BigInt__qOSYw6o1xYdytrcConbXkw TM__uCMRYreJLUF8qzvamp8QIg_2 = {{13402431016077863595ULL,

All calls are dispatched by this macro https://github.com/mratsim/constantine/blob/986245b5/constantine/config/curves.nim#L26-L28

macro Mod*(C: static Curve): untyped =
  ## Get the Modulus associated to a curve
  result = bindSym($C & "_Modulus")

And the prime modulus is instantiated as

const BLS12_381_Modulus = [...] # 0x1a0111ea397fe69a4b1ba7b6434bacd764774b84f38512bf6730d2a0f6b0f6241eabfffeb153ffffb9feffffffffaaab

In general I always put my big constants in consts and use identifier construction to access the proper constant because I was afraid of forcing the VM to recompute/carry those constants in cache if I used static. So the VM only has an enum and then this enum is bindSym($value + "_suffix") to a const.

I also ensured to have it exported from the DSL that instantiate everything https://github.com/mratsim/constantine/blob/986245b5/constantine/config/curves_parser.nim#L254-L262

proc exported(id: string): NimNode =
  nnkPostfix.newTree(
    ident"*",
    ident(id)
  )

# .......

    # const BN254_Snarks_Modulus = fromHex(BigInt[254], "0x30644e72e131a029b85045b68181585d97816a916871ca8d3c208c16d87cfd47")
    curveModStmts.add newConstStmt(
      exported($curve & "_Modulus"),
      newCall(
        bindSym"fromHex",
        nnkBracketExpr.newTree(bindSym"BigInt", bitWidth),
        modulus
      )
    )
Araq commented 4 years ago

I don't understand your code, you need to reduce it to an example that I can understand. From the looks of it, it seems to me that you generate the same 'const' multiple times.

mratsim commented 4 years ago

I have reduced the example here https://github.com/mratsim/addressable-consts

In particular this commit triggers the wrong behaviour (changing from an array to an object that wraps an array): https://github.com/mratsim/addressable-consts/commit/4c39c689506a6bb9ce23c561ff70a6e11ed2bd04

The full copy-pastable command is:

git clone https://github.com/mratsim/addressable-consts
cd addressable-consts

git checkout 1b1b344
nim c -r --outdir:build --nimcache:nimcache/main_array zzz_main

git checkout 4c39c68
nim c -r --outdir:build --nimcache:nimcache/main_object zzz_main

grep 13402431016077863595ULL nimcache/*/*

git checkout master

Output

$ grep 13402431016077863595ULL nimcache/*/*

# Correct case with array
nimcache/main_array/@mcurves_declaration.nim.c:N_LIB_PRIVATE NIM_CONST tyArray__4RmROn7lE6QlXejY1MVycQ BLS12381_Modulus__9caa3bJ0oqpK8to9aCzWSQnA = {13402431016077863595ULL,

# Incorrect case with object wrapping array
nimcache/main_object/@mzzz_comp2.nim.c:static NIM_CONST tyObject_BigInt__mC9bHrFeVsA9aeZ9arf46A9cfQ TM__5uoc5HUmvYTOqhiP4MNhUA_2 = {{13402431016077863595ULL,
nimcache/main_object/@mzzz_main.nim.c:static NIM_CONST tyObject_BigInt__mC9bHrFeVsA9aeZ9arf46A9cfQ TM__NuuMz9bC9bP6F9bixtzVphI9aQ_2 = {{13402431016077863595ULL,
exelotl commented 3 years ago

Following discussion on IRC I'd like to bump my use cases for addressable constants involving objects and pointers, which I'd really like to see become possible. Using @timotheecour 's let {.rom.} syntax as an example:

Animation data for spritesheets

type AnimData = object
  loop: bool
  frames: ptr UncheckedArray[int]
  len: int

let walkFrames {.rom.} = [4,5,6,7,8,9]

let walkAnimation {.rom.} = AnimData(
  loop: true,
  frames: cast[ptr UncheckedArray[int]](unsafeAddr walkFrames),
  len: walkFrames.len
)

# pointer to the currently playing animation
var currentAnim = unsafeAddr(walkAnimation)

Dispatch tables for scenes, entity behaviors, etc.

type Scene = object
  show: proc () {.nimcall.}
  hide: proc () {.nimcall.}
  update: proc () {.nimcall.}
  draw: proc () {.nimcall.}

proc onShow() =
  echo "enter main menu"
proc onHide() =
  echo "leave main menu"
proc onUpdate() =
  echo "updating main menu"
proc onDraw() =
  echo "drawing main menu"

let mainMenu {.rom.} = Scene(
  show: onShow,
  hide: onHide,
  update: onUpdate,
  draw: onDraw,
)

# pointer to the current scene
var currentScene = unsafeAddr(mainMenu)