nim-lang / RFCs

A repository for your Nim proposals.
137 stars 23 forks source link

Avoiding explicit NimMain - Improving Nim libraries experience #538

Open mratsim opened 1 year ago

mratsim commented 1 year ago

Followup on the discussion at: https://discord.com/channels/371759389889003530/768367394547957761/1130053134727782410

Currently using a Nim libraries usually requires calling NimMain to initialize global variables and Nim runtime.

This is extra friction, especially when we want to replicate C libraries that don't require this.

Motivating example

For example, many scientific libraries can autodetect support for CPU features either through the compiler by re-using the same function name but with different target features:

There are several ways to implement this, from Agner Fog https://www.agner.org/optimize/optimizing_cpp.pdf section 13.5, there are atleast: image

And GCC function multiversioning: https://gcc.gnu.org/wiki/FunctionMultiVersioning

__attribute__ ((target ("default")))
int foo ()
{
  // The default version of foo.
  return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
  // foo version for SSE4.2
  return 1;
}

__attribute__ ((target ("arch=atom")))
int foo ()
{
  // foo version for the Intel ATOM processor
  return 2;
}

__attribute__ ((target ("arch=amdfam10")))
int foo ()
{
  // foo version for the AMD Family 0x10 processors.
  return 3;
}

int main ()
{
  int (*p)() = &foo;
  assert ((*p) () == foo ());
  return 0;
}

Current situation

We assume that we only want to ask for CPU capabilities once and not at each function call. Hence we need to:

  1. Call the CPU features detection function once.
  2. Either store the features detected in a global variable.
  3. Or store the correct functions to call, depending on the feature detected.

But as a library provider, this backend part is something that is ideally hidden and only the functions interesting for the user are exposed like compute_matrix_multiplication or verify_cryptographic_signature

Due to Nim globals being initialized in NimMain, this is currently not supported. Furthermore, function multi-versioning will not work IIRC, even with codegendecl for target attributes, as Nim will not compile functions with colliding C names.

A workaround is to use an __attribute__((constructor)) function, possibly __attribute__((constructor,used)) (in case of zealous dead-code elimination by LTO) for each global a library needs to initialize. However this is limited to globals that don't require Nim runtime (so seqs, strings, ref are excluded)

Low-level - Unix

Looking at my library: https://github.com/mratsim/constantine/blob/67fbd8c/constantine/ethereum_bls_signatures.nim, compiled with --mm:arc and -d:UseMalloc --panics:on -d:noSignalHandler to ensure no runtime (allocator, exceptions which all needs an allocator, signals, ...), the NimMain related functions are:

// @methereum_bls_signatures.nim.c
N_LIB_PRIVATE void PreMainInner(void) {
    // This is my CPU detection routine that fills my global variables
    atmplatformsatsisaatscpuinfo_x86dotnim_Init000();
}

N_LIB_PRIVATE int cmdCount;
N_LIB_PRIVATE char** cmdLine;
N_LIB_PRIVATE char** gEnv;
N_LIB_PRIVATE void PreMain(void) {
    atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000();
    PreMainInner();
}

N_LIB_PRIVATE N_CDECL(void, NimMainInner)(void) {
    NimMainModule();
}

N_LIB_EXPORT N_CDECL(void, ctt_eth_bls_init_NimMain)(void) {
    void (*volatile inner)(void);
    PreMain();
    inner = NimMainInner;
    (*inner)();
}

N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
{
}
}
// @m..@s..@s..@s..@s.choosenim@stoolchains@snim-1.6.12@slib@ssystem.nim.c

static N_INLINE(void, initStackBottom)(void) {
}

N_LIB_PRIVATE N_NIMCALL(void, atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000)(void) {
{
    initStackBottom();
}
}

As mentioned in https://discord.com/channels/371759389889003530/768367394547957761/1130212409496322098, one of the motivation for the explicit call was for the old GCs to determine the stack size, I assume for stack scanning of pointers. And there are apparently other initialization routines (which?).

It's also interesting to note that nimbase.h defines

// https://github.com/nim-lang/Nim/blob/v2.0.0/lib/nimbase.h#L513
#define NIM_POSIX_INIT  __attribute__((constructor))

And it's supposed to be used in cgen for PosixCDllMain / NimMainInit: image

but NimMainInit doesn't appear anywhere in my generated C code.

Low-level - Windows

MSVC provides a similar mechanism: https://github.com/supranational/blst/blob/f8af94a/src/cpuid.c#L47

Questions

  1. Now that arc/orc are default, should we at least have the globals auto-initialized when they are built with ARC/ORC?
  2. In which scenario is NimMainInit built into a library, as this would solve 1?
mratsim commented 1 year ago

Without passing the --noMain flag we have the following result:

Shared library


N_LIB_PRIVATE void PreMainInner(void) {
    atmdotdotatsconstantineatsplatformsatsisaatscpuinfo_x86dotnim_Init000();
}

N_LIB_PRIVATE int cmdCount;
N_LIB_PRIVATE char** cmdLine;
N_LIB_PRIVATE char** gEnv;
N_LIB_PRIVATE void PreMain(void) {
    atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000();
    PreMainInner();
}

N_LIB_PRIVATE N_CDECL(void, NimMainInner)(void) {
    NimMainModule();
}

N_LIB_EXPORT N_CDECL(void, ctt_init_NimMain)(void) {
    void (*volatile inner)(void);
    PreMain();
    inner = NimMainInner;
    (*inner)();
}

N_LIB_PRIVATE void NIM_POSIX_INIT NimMainInit(void) {
    ctt_init_NimMain();
}

N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
{
}
}

This is almost the wanted result. Tested and confirmed that N_LIB_PRIVATE void NIM_POSIX_INIT NimMainInit does the right thing :tm:.

Only issue is that the NimMain is tagged N_LIB_EXPORT but I don't think it should?

Static library

N_LIB_PRIVATE void PreMainInner(void) {
    atmdotdotatsconstantineatsplatformsatsisaatscpuinfo_x86dotnim_Init000();
}

N_LIB_PRIVATE int cmdCount;
N_LIB_PRIVATE char** cmdLine;
N_LIB_PRIVATE char** gEnv;
N_LIB_PRIVATE void PreMain(void) {
    atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000();
    PreMainInner();
}

N_LIB_PRIVATE N_CDECL(void, NimMainInner)(void) {
    NimMainModule();
}

N_CDECL(void, ctt_init_NimMain)(void) {
    PreMain();
    NimMainInner();
}

int main(int argc, char** args, char** env) {
    cmdLine = args;
    cmdCount = argc;
    gEnv = env;
    ctt_init_NimMain();
    return nim_program_result;
}

N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
{
}
}

That's not what we want.

Araq commented 1 year ago

Somewhat related, a name like atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000 is a bug.

mratsim commented 1 year ago

For my use-case, I have created a loadTime macro pragma that allows a proc to be called at program or library load time, it works whether the code is compiled to an application, dynamic or static library.

Note: MSVC/VCC support to be confirmed. And unsure about TCC

https://github.com/mratsim/constantine/blob/40643f0/constantine/platforms/loadtime_functions.nim#L18-L51

import std/macros

const GCC_Compatible* = defined(gcc) or defined(clang) or
                        defined(llvm_gcc) or defined(icc)

macro loadTime*(procAst: untyped): untyped =
  ## This allows a function to be called at program or library load time
  ## Note: such a function cannot be dead-code eliminated.

  procAst.addPragma(ident"used")     # Remove unused warning
  procAst.addPragma(ident"exportc")  # Prevent the proc from being dead-code eliminated

  if GCC_Compatible:
    # {.pragma: gcc_constructor, codegenDecl: "__attribute__((constructor)) $# $#$#".}
    let gcc_constructor =
        nnkExprColonExpr.newTree(
          ident"codegenDecl",
          newLit"__attribute__((constructor)) $# $#$#"
        )
    procAst.addPragma(gcc_constructor) # Implement load-time functionality

    result = procAst

  elif defined(vcc):
    warning "CPU feature autodetection at Constantine load time has not been tested with MSVC"

    template msvcInitSection(procDef: untyped): untyped =
      let procName = astToStr(def)
      procDef
      {.emit:["""
      #pragma section(".CRT$XCU",read)
      __declspec(allocate(".CRT$XCU")) static int (*p)(void) = """, procName, ";"].}

    result = getAst(msvcInitSection(procAst))

  else:
    error "Compiler not supported."

Somewhat related, a name like atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000 is a bug.

Seems like 2 things create those kind of proc names: