libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.52k stars 1.77k forks source link

Proposal: SDL_MemoryMapFile #10940

Open kieselsteini opened 3 days ago

kieselsteini commented 3 days ago

Hello there,

I've a small proposal for SDL3 here. There is the already well known SDL_LoadFile function which comes in very handy. But it would be also very handy, to have a similar function to effectively memory map a file. On most Unix systems this can be achieved with the mmap syscall and on Windows there is CreateFileMapping.

The signature of the function could look like:

const Uint8* SDL_MemoryMapFile(const char *filename, size_t *datasize)

Of course this will memory map the requested file as read only.

I can try to implement a first version of this feature, but I've no Windows box to test it.

slouken commented 3 days ago

You'd want an offset parameter as well, I believe, but this seems super useful!

icculus commented 3 days ago

Is it actually beneficial to mmap a file?

(Discounting mmap as a device interface on Unix and for loading shared libraries and stuff...I just mean regular file i/o.)

nfries88 commented 2 days ago

mmap is quite useful if you have a single (or very few) frequently accessed files that fit entirely into the virtual address space easily but are still large enough that forcing the entire thing into physical memory all at once (eg by simply reading the entire file) is undesirable.

its use for general purpose I/O is generally discouraged, but for the right pattern it can be much more efficient than using I/O syscalls. overuse can tank system performance by TLB pollution, and when an access actually causes a page fault it will actually perform worse than just doing the I/O would have because of the overhead of maintaining page mappings (eg using mmap will tend to outperform read() for small sequential accesses over a large file but not for large or random accesses)

kieselsteini commented 2 days ago

As @nfries88 explained, this can be very helpful when you have a few "large" asset archives which you can simply map to memory and then just directly access the data. The file will not be loaded at once to memory, but only the portions you access and the OS will also unload/discard allocated memory pages automatically for you.

A good example of it is the Chocolate Doom Project. They use mmap/CreateFileMapping on platforms that support it to "open" the DOOM.WAD archive (see w_file_posix.c they have also one for Win32). Even if the 14MB of that file will fit easily to RAM on machines nowadays. That was actually also the projects which inspired me, to use a similar pattern and then also to propose it here.

madebr commented 2 days ago

I remember this llama.cpp change last year. The commit documents 3 advantages.

nfries88 commented 1 day ago

honestly having portable support for shared memory objects would be convenient for custom high-performance tracing as well, and while it would be less than ideal (bc it becomes subject to the filesystem cache and doesn't conform to normal system-specific practices) it would be simple to implement that in terms of this modified to enable write access.

icculus commented 1 day ago

(I still think this is a no from me, unless @slouken feels strongly otherwise.)

slouken commented 1 day ago

I'm a fan. Does someone want to create a PR for this?

kieselsteini commented 1 day ago

I made a first attempt: PR-10960. It's my first try to contribute here ;)

maia-s commented 1 day ago

Should this return const volatile Uint8*, since the mapped memory can change if the underlying file changes?

nfries88 commented 14 hours ago

Should this return const volatile Uint8*, since the mapped memory can change if the underlying file changes?

volatile is insufficient to appropriately deal with this possibility. at best it just makes the introduced inconsistency appear faster.

maia-s commented 7 hours ago

Yeah, it's unsafe either way (if the file shrinks, accessing the removed bytes can segfault), but wouldn't volatile be a little bit safer at least? Ig it depends on the intent of the user. If they expect changes, they want the changes to appear as soon as possible, but if they expect the file to be constant, they don't. But the lack of volatile won't help in that case.