Closed kevinlawler closed 11 years ago
I will have to double-check Stevens & etc. on this, but I'm pretty sure this can lead to portability problems. It may be free of edge cases on common platforms; whether this is a problem in practice and not just in theory is your call.
We could use mremap when available (Linux only, AFAIK), but the most portable solution is to just copy to a larger chunk of mmap'd pooled memory. We want the same semantics as realloc, but for mmap'd memory.
On OSX (Mountain Lion), overlapping MAP_FIXED mmaps will clobber each other with zeroed out memory pages, as expected: git://gist.github.com/3889656.git
I'm pretty sure this can lead to portability problems
Do you mean the solution suggested on StackOverflow or something else?
I think silently clobbering existing mappings, the expected behavior, is bad for this use case. If some platform(s) give an error instead, that's helpful, but can't be depended upon.
Checked both 32-bit Gentoo and OSX. After removing "MAP_FIXED" parameter, I get correct results even when adding many pages to the end. As mentioned by @silentbicycle above, this may be a problem just in theory, but not in practice.
Removing MAP_FIXED
introduces a new problem which is that the OS may add pages elsewhere besides the end of the object, which is what is expected. That would be bad since objects are assumed to be contiguous in memory.
(The test case to show how this fails is to map two objects contiguously in memory and then attempt to expand the first. Expected: With MAP_FIXED
it clobbers the the second object. Expected: Without MAP_FIXED
the tail of the first object is added to a random location in memory, potentially even before the object.)
Advised: restore MAP_FIXED
, use a portable option from below to check beforehand for already-mapped areas. The result of finding an already-mapped area should be the same as if MAP_FAILED
was the function return in the existing code.
Options:
mincore
see: http://stackoverflow.com/a/14956743/365478 or http://stackoverflow.com/a/8363219/365478msync
see: http://stackoverflow.com/a/8368346/365478posix_mem_offset
see above (currently not portable?)What are the timings for each? The best choice is probably the fastest. StackOverflow commenters suggest mincore
is best since it is lowest-level.
The previous idea to reserve large chunks of memory beforehand seems not as good.
I think the above is the correct solution.
(Speculation: After implementing the above, to be perfectly correct we may need to use MAP_LOCKED
or mlock
on the first page of any file mmapped from disk. This is based on four untested assumptions: 1. anonymous memory never causes mincore/msync to announce that it isn't mapped. 2. mincore/msync, thinking the address is not in use, fails to return an error when looking at an address containing a page, originally mapped to a file on disk, which has not been used in a while after heavy mmap usage elsewhere. 3. operating systems do not wire in the first page of files by default. 4. locked memory does indeed play nicely with mincore/msync. The case, if it exists, would be that we map file X on disk, do a lot of work, potentially at the end of file X, do some more work appending to a separate object, and then that object extends over the beginning of X's mapping because mincore/msync do not alert us, the first page of X having been paged out. It is an odd corner case that we may never see, and no worse than the possible scenarios in the previous solutions.)
All of this is to get around not having a portable mremap.
Is there any benefit in using mremap for Linux (and maybe for Cygwin), and an alternate solution for OSX?
Potentially, but it needs to be balanced against added portability issues.
Scott
On Thu, Mar 28, 2013 at 9:30 AM, Tom Szczesny notifications@github.comwrote:
Is there any benefit in using mremap for Linux (and maybe for Cygwin), and an alternate solution for OSX?
— Reply to this email directly or view it on GitHubhttps://github.com/kevinlawler/kona/issues/162#issuecomment-15588256 .
It's a neutral change.
NB. The full list of systems we claim to support is: OS X/Linux/BSD/Cygwin.
FYI: Looks like Cygwin does not have mremap either.
Suppose we use mincore(). Suppose the original k-structure uses m bytes, and we want to increase the size by an additional n bytes. We want to check that the additional n bytes do not contain an already-mapped area. (In other words, the additional n bytes needs to be completely unmapped.) However, the documenation for mincore seems to indicate that the error ENOMEM will occur if the checked area (the additional n bytes) contains some unmapped memory. Seems to me that we need a flag that indicates either 1) all of the n bytes are unmapped (OK), or 2) some of the n bytes are already mapped (problem).
However, the documenation for mincore seems to indicate that the error ENOMEM will occur if the checked area (the additional n bytes) contains some unmapped memory.
Do you mean "contains some mapped memory"? If so, I think that gives us flag 2, and we are OK. If not, then you'll have to explain more since I don't understand.
I did mean "contains some unmapped memory". The wording in http://man7.org/linux/man-pages/man2/mincore.2.html is somewhat ambiguous: "ENOMEM addr to addr + length contained unmapped memory." Does that mean "contained some unmapped" or "contained all unmapped"? The wording in http://www.gsp.com/cgi-bin/man.cgi?section=2&topic=mincore is much more specific: "The virtual address range specified by the addr and len arguments is not fully mapped." The problem is when the address range is "not fully mapped" does that mean that it is "fully unmapped", which is what we need to know.
OK, I see. You're saying this could give flag 3: some of the n bytes are mincore-unmapped. This is sufficient if we add a single page at a time. For multiple pages the flag is not sufficient to know whether or not we can proceed, unless we check each page.
If flag 3 is indeed the case, we can call mincore
for each page we intend to add, and verify that they all throw errors because they are unmapped. Almost always when we call kexpander
we are adding a page at a time using kap
. The only way to add more than one page is to use kapn
, which was included for completeness's sake. The only method using kapn
is a printing method.
What is the timing like for calls to mincore
? Is it fast enough to not matter or is it slow?
Z K kapn_(K _a,V *v,I n) { if(!a||!n)R 0; K k=_a; I t=k->t,m=k->n,p=m+n; ...
This comment simply documents my understanding as I go through the code: Both kap: extern K kap(Ka,V v){R kapn_(a,&v,1);} and kapn: extern K kapn(K a,V v,I n){R kapn(a,&v,n);} call kapn which expands the k-structure, but not by pages. kap adds 1 element to the k-structure (which may require no additional pages, or one additional page). kapn adds n elements to the k-structure (which may require no, one, or multiple additional pages). (It may be possible that kap requires multiple additional pages if the added single element is large.)
That looks right to me. We can further assume that kap
, if sufficient space does not already exist, requires at most one new page since the single element will always be small. The single element is bounded by the size of the atom types and the size of the object reference. These are all much smaller than the page size. (Mixed lists and strings store pointers.)
(For specifics: I think the amount we can add is in [1, 8] bytes inclusive & tight, and that page sizes are essentially guaranteed to be much larger than that on all systems we currently support or intend to support in the future. Off the top of my head on Linux and OSX the page size is 4096. However, I think the discussion is moot since supporting kapn correctly means supporting arbitrary-count page additions.)
The page size on Cygwin is 65536 = 2^16 (which is probably the page size on Windows). Most of the discussion is moot. The key point (which I forgot): "Mixed lists and strings store pointers".
Just to document the latest wrinkle on this issue: In preparation for using mincore to pre-verify that each page added is not already mapped, I checked to see whether the following lines (placed in kexpander in km.c) print "map_OK" in 32-bit-Linux, OSX and Cygwin:
if(MAP_FAILED!=mmap(a+e,f,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_FIXED,-1,0)) {O("map-OK\n");R 1;}//Add pages to end else O("map-failed\n");
I get "map-OK" in OSX and Cygwin. I get "map-failed" in 32-bit-Linux. Removing the "MAP_FIXED" flag results in "map-OK" on 32-bit-Linux.
Since I have not updated my version of 32-bit-Gentoo for over 6 months, I will attempt that and see if the problem disappears.
That's strange. I'm not sure why the next block would already be mapped. Is it bad luck? Or does Linux automatically map the block after for some reason? Maybe a bad interaction with MAP_ANON?
Perhaps good justification on Linux to use mremap
. Does mremap
work correctly with MAP_ANON?
On 32-bit-Linux: 1) When mmap fails with MAP_FIXED, errno is set to ENOMEM. 2) mremap does work correctly when the original mmap was set with MAP_ANON. 3) I agree that we should use mremap for Linux.
Latest wrinkle: mincore is not available in Cygwin.
Hrm, I suppose since Cygwin is based on top of the memory manager for Windows. The Cygwin team seems to have successfully ported mmap
to use MapViewOfFile
. I can think of a variety of reasons why mincore
might not be present. For one, as a system call, it might not make sense on Windows. Is msync
present and will msync
give us the same result? Alternatively, is there a Windows function that does the same thing, and can we call it from Cygwin C? For instance, this function http://msdn.microsoft.com/en-us/library/windows/desktop/aa366902(v=vs.85).aspx looks promising. Other related functions are listed on the same page. The MapViewOfFile
function was a close mmap
clone, so we might get lucky.
msync is present in Cygwin. Looks like we may implement 3 solutions (mincore for OSX, msync for Cygwin, mremap for Linux).
The above commit 3137f177a52e0d33ef6e77633a77634dc8312700 demonstrates that for OSX using mincore with mmap and MAP_FIXED is not a successful replacement for mremap. You can check this yourself by compiling this commit and running /.k_test on OSX. Although the test result is correct, it is achieved by copying the data.
If the extra space is unmapped, we are counting on mincore to fail with a return code of -1, and an errno set to ENOMEM. In fact, mincore does not fail, it has a return code of 0, and therefore errno is not set to ENOMEM.
posix_mem_offset does not appear to be implemented in Linux, Cygwin or OSX. It is implemented in the QNX operating system.
As written, a bug?
https://github.com/kevinlawler/kona/blob/master/km.c
The problem is that mapping MAP_FIXED over an existing address space won't fail, it will just destroy the other mapping if it was close enough. The bug is if another mapping is close enough behind the expanding mapping.
(I am not sure how I missed this the first time around: maybe there is some other consideration that prevents this bug. It seems like the kind of thing I would've been wary of. Maybe I thought MAP_FIXED would fail on such allocations. Does it really always succeed? It would be nice to verify via test case on OS X, Linux.) ((Another explanation is that StackOverflow didn't exist when I did this, or didn't have much on the subject, and so my understanding was based on a flawed interpretation of the POSIX documentation.))
One potential fix:
http://stackoverflow.com/questions/6446101/how-do-i-choose-a-fixed-address-for-mmap#comment7573599_6446215
"An aside: MAP_FIXED can be used safely if you first make a single mapping without MAP_FIXED, then use MAP_FIXED to create various mappings at fixed addresses relative to one another over top of the original throw-away mapping that was at a kernel-assigned location. With a sufficiently large mapping size, this might solve OP's problem."
If this lets us map "anonymous memory" without ever actual allocating real RAM, then we could always reserve something very large for any mapped structure.