modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo/manual/
Other
23.17k stars 2.59k forks source link

[Feature Request] Add mmap module #1134

Open tairov opened 1 year ago

tairov commented 1 year ago

Review Mojo's priorities

What is your request?

Implement mmap module natively in Mojo link: https://man7.org/linux/man-pages/man2/mmap.2.html

What is your motivation for this change?

When implementing multiple solutions on Mojo, we reached a stage where huge files needed to be loaded into memory (for example, open source llm models).

For these purposes, it would be really great to have the mmap module implemented natively in Mojo.

Any other details?

No response

jackos commented 11 months ago

@abduld fyi

Relevant blog post: https://justine.lol/mmap/

tairov commented 11 months ago

@abduld fyi

Relevant blog post: https://justine.lol/mmap/

Epic story how they get it working in C++, I didn't know such a simple and obvious thing might be relevantly complicated to implement within cpp/stl. For example on C , the mmap is trivial -- https://github.com/karpathy/llama2.c/pull/50/files On Mojo it should be straightforward I hope, since we have full control on pointers. Currently I'm doing "pseude-mmap", when weights are loaded into memory one time (with file IO), and then I'm carefully set up pointers.

ihnorton commented 11 months ago

Calling mmap through a function pointer (can't use external_call yet, AFAICT) seems to crash: https://gist.github.com/ihnorton/a49c8165f86489def9031583f3ed1b20

(macOS arm64, mojo 0.5)

JoeLoser commented 11 months ago

Thanks for the links and discussion here, everyone. I should have a chance to look into this at the end of this week.

tairov commented 11 months ago

Calling mmap through a function pointer (can't use external_call yet, AFAICT) seems to crash: https://gist.github.com/ihnorton/a49c8165f86489def9031583f3ed1b20

What a pity.. I was betting high on this

deepankarsharma commented 11 months ago

I was able to get mmap to work based on @ihnorton excellent post above. In fact I dont fully understand why @ihnorton is seeing a crash and I am not. [I am running mojo 0.5.0 (6e50a738) on ubuntu amd64]

from sys import ffi
from memory import unsafe

struct MapOpt:
    alias MAP_SHARED = 0x01
    alias MAP_PRIVATE = 0x02

struct Prot:
    alias PROT_NONE = 0x0
    alias PROT_READ = 0x1
    alias PROT_WRITE = 0x2
    alias PROT_EXEC = 0x4

alias c_void = UInt8

alias mmap_type = fn(addr: Pointer[c_void],
    len: Int64,
    prot: Int32,
    flags: Int32,
    fildes: Int32,
    offset: Int64) -> Pointer[c_void]

def main():
    let handle = ffi.DLHandle("")
    let c_mmap = handle.get_function[mmap_type]("mmap")
    let fnm = StringRef("data")
    let fd = external_call["open", Int, Pointer[Int8], Int](fnm.data._as_scalar_pointer(), 0x0)
    if (fd == -1):
        raise "Failed to open file"
    let NULL = unsafe.bitcast[c_void](0x0)
    let p = c_mmap(
        NULL, 16, Prot.PROT_READ, MapOpt.MAP_SHARED, fd, 0
    )
    for i in range(26):
        print(p[i])
ihnorton commented 11 months ago

Here's a version that works on mac; in order to get the function pointer, it is necessary to use DLHandle on a dylib which is linked into the process.

I don't see a way to write the equivalent of ctypes.CDLL(None) right now on macOS, whereas DLHandle("") does that on Linux (matching dlopen semantics).

``` from sys import ffi from memory import unsafe struct MapOpt: alias MAP_SHARED = 0x01 alias MAP_PRIVATE = 0x02 struct Prot: alias PROT_NONE = 0x0 alias PROT_READ = 0x1 alias PROT_WRITE = 0x2 alias PROT_EXEC = 0x4 alias c_void = UInt8 alias mmap_type = fn(addr: Pointer[c_void], len: Int64, prot: Int32, flags: Int32, fildes: Int32, offset: Int64) -> Pointer[c_void] def main(): let handle: ffi.DLHandle if ffi.os_is_linux(): handle = ffi.DLHandle("") #elif ffi.os_is_windows(): # bug: if this section is un-commented, then `h` # is considered uninitialized below # raise "Not yet supported on Windows" else: # we just need _a_ dylib in the image handle = ffi.DLHandle("libate.dylib") let c_mmap = handle.get_function[mmap_type]("mmap") let fnm = StringRef("data") let fd = external_call["open", Int, Pointer[Int8], Int](fnm.data._as_scalar_pointer(), 0x0) if (fd == -1): raise "Failed to open file" let NULL = unsafe.bitcast[c_void](0x0) let p = c_mmap( NULL, 16, Prot.PROT_READ, MapOpt.MAP_SHARED, fd, 0 ) for i in range(26): let v = p[i].cast[DType.int64]() #print(chr(Int64(v))) print_no_newline(chr(v.to_int())) ```
JoeLoser commented 4 months ago

I'd welcome a new mmap module. We should take a look at https://docs.rs/memmap/latest/memmap/ and come up with a proposal for getting started on that. Is anyone interested in driving this?

KCaverly commented 3 months ago

Hey @JoeLoser - I got a working implementation up for mmap in Mojo.

I wrapped up some of my thoughts, and provided a working example in the proposal here: #3218