Interface to LLVM::DataLayout:: methods

jfaure commented 3 years ago

It iss realistically necessary to be able to use getPointerSize and getTypeAllocSize (at least) from https://llvm.org/doxygen/classllvm_1_1DataLayout.html

The sizeof as a gep to then end of a nullpointer hack has the disadvantage of not being able to influence code generation.

getTypeAllocSize in particular requires us to pre-convert llvm-hs types to C++ so this may be tricky.

andrew-wja commented 3 years ago

This is actually very straightforward to do, because LLVM exposes this through the C API: https://llvm.org/doxygen/group__LLVMCTarget.html

However, I don't quite understand exactly what you mean by "not being able to influence code generation". Do you have an example?

jfaure commented 3 years ago

Allocating a tagged union: you want to allocate enough space for the biggest member (this cannot reliably be achieved by inspecting the llvm-hs types who don't know the pointer size and alignment details)

andrew-wja commented 3 years ago

Ah, I see. I was only thinking about datatypes that LLVM provides, but a tagged union is indeed tricky. You would need to iterate over the union types and statically determine which is largest. Using gep only works if you can define the type using the builtin types already provided by LLVM!

Sounds like the use case is maximumBy (comparing getTypeAllocSize), which we can add to the tests to make sure that that functionality continues to work. I'll see if I can add getPointerSize and getTypeAllocSize to the FFI!

jfaure commented 3 years ago

A couple other use-cases:

Checking if a struct field containing a pointer is large enough for a some other operand.
Checking if a previous alloca or struct field is reusable (avoid making llvm figure it out with llvm.lifetime.(start|end) intrinsics)
Checking if a struct is small enough to be returned by value (system V ABI allows using 2 64 bit registers for this, and sadly it is up to the front-end to figure this out pre-llvm), or if need to write it to sret pointer

luc-tielen commented 3 years ago

I just ran also in the issue of needing DataLayout functionality. Here's the C code I'm trying to port:

#define DESIRED_NUM_KEYS \
    (((BLOCK_SIZE > sizeof(struct node_data)) \
        ? BLOCK_SIZE - sizeof(struct node_data) \
        : 0) / sizeof(value))

#define NUM_KEYS (DESIRED_NUM_KEYS > 3 ? DESIRED_NUM_KEYS : 3)

typedef struct node
{
    node_type type;
    struct node_data meta;
    value values[NUM_KEYS];
} node;

Basically: I need "sizeof" so that I can use that result during codegen to determine length of an array in some other type.

jfaure commented 3 years ago

I added this as a sort of hack some time back https://hackage.haskell.org/package/llvm-hs-pure-9.0.0/docs/LLVM-AST-Constant.html#v:sizeof

luc-tielen commented 3 years ago

@jfaure I don't think that works? ArrayType requires a Word64 for size: https://hackage.haskell.org/package/llvm-hs-pure-9.0.0/docs/LLVM-AST-Type.html#t:Type

jfaure commented 3 years ago

That's no problem; It wraps the type with some llvm instructions, the size won't be available for you like with datalayout, but you can use it in the emitted llvm where it will hopefully be constant folded

luc-tielen commented 3 years ago

@jfaure How then? It just doesn't typecheck..? Also there's no function to go from Constant to Word64.. and the other functions in that module are partial and would error out if I tried converting that way. I would prefer defining my types all using the typedef function which uses the LLVM.AST.Type I mentioned earlier. For this I think the only way is with DataLayout..

BTW: here's what I tried:

experiment :: ModuleBuilder ()
experiment = do
  s <- typedef "struct_t" $ Just $ StructureType False [i8, i64]
  let x = Constant.sizeof s
  let a = ArrayType x i32 -- Couldn't match expected type 'Word64' with 'Constant'
  -- ...

@andrew-wja I'm not familiar with the codebase but if you give me some high level pointers on how to best approach this, I can try giving it a shot..

andrew-wja commented 3 years ago

@luc-tielen I understand what you want to do, but I don't think it's possible with llvm-hs right now, so you're correct to post under this issue.

LLVM wants you to pass an integer to the ArrayType constructor, even in C++: https://llvm.org/doxygen/classllvm_1_1ArrayType.html#adf411edc4f135b570ab218079474ce77

So you really do need to ask libLLVM through an IO operation what the size of the laid-out struct type is.

Right now, it isn't possible using llvm-hs to construct any type that depends on IR-level values. It might be possible to work around this in your code generation. For example, you can use alloca to allocate an array with an IR-level Operand element count. In this case that's not very appealing, though.

luc-tielen commented 3 years ago

This got me a little further for my specific case (it looks like some datalayout functionality is exposed in internals?):

experiment :: ModuleBuilderT IO ()
experiment = do
  s <- typedef "struct_t" $ Just $ StructureType False [i8, i64]
  size <- liftIO $ do
    s' <- Context.withContext $ flip runEncodeAST $ encodeM s
    let dl = defaultDataLayout LittleEndian
    DL.withFFIDataLayout dl $ flip DL.getTypeAllocSize s'
  print ("size =", size)

This snippet works if you use i8 or any of the other builtin types instead of s in the encodeM function, but with my custom struct I get EncodeException "reference to undefined type: Name \"struct_t\""

If I could get an up-to-date DataLayout inside the ModuleBuilder monad (like for example a currentDatalayout helper function), my problem would be fixed?

andrew-wja commented 2 years ago

@luc-tielen mixing and matching between the high-level llvm-hs-pure and low-level llvm-hs FFI interface directly in this way is uncharted territory, but it makes sense that builtin types should always be visible.

I think what is happening is that the explicit runEncodeAST is blowing away the local encode state, so the type definition is no longer visible. If you look at what happens in the EncodeM instance for Type, specifically for NamedTypeReference we end up calling lookupNamedType. However, if you look at the definition of runEncodeAST it creates a new, empty encode state.

runEncodeAST is designed to be the top-level entry point to the encoding, but your code snippet is calling it inside a module builder context. I think if you add a runEncodeAST' which takes an existing encode state as a parameter and extends it, rather than running the AST encoding in a fresh encode state, that should solve your problem.

luc-tielen commented 2 years ago

@andrew-wja I tried my hand at it today, but a fix is non-obvious (atleast to me). The IR / Module builder monad keeps the definitions hidden internally.. you can extract them if you make a variant of runEncodeAST that runs in ModuleBuilderT IO a basically, but then I tried reusing some other functionally and got stuck with a cycle in my imports..

luc-tielen commented 2 years ago

Did another attempt today:

experiment :: ModuleBuilderT IO ()
experiment = do
  let n = "struct_t"
      ty = StructureType False [i8, i64]
  s <- typedef n $ Just ty
  size <- liftIO $ do
    withHostTargetMachine PIC JITDefault None $ \tm -> do
      dl <- getTargetMachineDataLayout tm

      Context.withContext $ flip runEncodeAST $ do
        createType n ty
        s' <- encodeM s
        liftIO $ DL.withFFIDataLayout dl $ flip DL.getTypeAllocSize s'
  print ("size = ", size)

createType :: Name -> Type -> EncodeAST ()
createType n ty = do
  (t', n') <- createNamedType n
  defineType n n' t'
  setNamedType t' ty

This prints out size = 16 for me. Not obvious at all, but it works. Now I need to refactor it and figure out a way to nicely integrate it in my compiler :sweat_smile:.

llvm-hs / llvm-hs

Interface to LLVM::DataLayout:: methods #360