Open 5792609d-7136-4bf5-a72c-931da2480f6a opened 13 years ago
bpo-11277 revealed that Mac OS X fsync() does not always and gracefully synchronize file data to physical backing store. To gain real fsync(2) behaviour fcntl(2) must be called with the F_FULLFSYNC flag.
Oh, it remembers a long story around ext3/ext4 and write barrier, with a specific problem in Firefox with SQLite.
http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/fsync.2.html " For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLF- SYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail."
http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ "fsync on Mac OS X: Since on Mac OS X the fsync command does not make the guarantee that bytes are written, SQLite sends a F_FULLFSYNC request to the kernel to ensures that the bytes are actually written through to the drive platter."
http://lwn.net/Articles/283161/
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-September/001356.html " OTP-7471 On Mac OS X, file:sync/1 now guarantees that all filesystem buffers are written to the disk by using the fcntl() with F_FULLFSYNC option. Previously, file:sync/1 called fsync(), which only guaranteed that the data had been transferred to the disk drive. (Thanks to Jan Lehnardt.)"
http://lists.mindrot.org/pipermail/portawiki-discuss/2005-November/000002.html
Traditionally the functions in os are thin wrappers around OS functions, this patch changes that for os.fsync.
Two nits wrt. the patch itself: 1) The behaviour on OSX should be documented in the stdlib reference, that currently says that os.fsync will call the C function fsync
2) The comment in the __APPLE__ case can be clearer, and should explain the issue (as short as possible).
I'm -1 on merging this patch though, as the fsync function on OSX behaves just like the one on Linux, the fnctl option adds a stronger guarantee in that it forces a flush of the buffers on the storage device as well. The manpage for fsync on a linux host says:
In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.
Adding the F_FULLSYNC option to os.fsync would be fine though (if it isn't already implemented)
The linux
I know that POSIX makes no guarantee regarding durable writes, but IMHO that's definitely wrong, in the sense that when one calls fsync, he expects the data to be committed to disk and be durable. Fixing this deficiency through Python's exposed fsync might thus be a good idea (for example, the Window version probably doesn't call fsync, so it's already not a direct mapping).
when one calls fsync, he expects [...] Fixing this deficiency through Python's exposed fsync [...]
I think so, too. http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html even permits "null implementation"s etc. etc. etc. (I like the last paragraph of "Rationale" the most.) os.rst states for fsync():
[...] to ensure that all internal buffers associated with *f*
are written to disk.
If a platform offers the opportunity to actually implement the desired behaviour then i would do so, regardless of what needs to be done internally to achieve it.
(And the single question on apple is simply what to do otherwise with that VMS/VFS bug for at least large sparse files. I can only imagine adding multiple notes in the documentation, here and there.)
See \http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html\ for linux' behavior, in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.
This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees.
With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.
BTW. The need to use F_FULLSYNC in bpo-11277 might be a bug in OSX itself, have you filed an issue in Apple's tracker?
A different approach for this issue is to document fcntl.fcntl(fd, fcntl.F_FULLFSYNC) in fsync() documentation.
Agreed with Ronald. If Apple thinks fsync() shouldn't flush the hardware cache, there's no reason for us to override that.
in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.
Barriers are now enable by default in EXT4, and Theodore Tso has been favourable to that for quite some time now: http://lwn.net/Articles/283288/
As for OS-X, this is definitely a bug (I mean, having to call fsync before mmap is a huge bug in itself).
On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:
This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees. With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.
I don't see it that way for multiple reasons.
One of it is that Python states that it "is a beginner language", and the sentence i've quoted a part of in my last message is very beginner friendly. Now i don't think you can require that much further knowledge from a beginner at all than what is described in that very sentence.
(Just think of an african kid with a sponsored laptop and possibly 10 minutes internet access per day which tries hard to rise up and puts *galaxies of trust* in what it reads in os.rst, maybe because it is the only documentation he has available!)
Then Python is a high-level language. It's true that there are many shallow wrappers around lower-level functions, but still. It is cross-platform. Just have a look into a, say, file I/O implementation of a C library/program which aims to be portable. If all that #if#endif would be necessary in a high-level language, then why don't write a Makefile and compile that thing? Not that much more work, then - many thanks to the GCC people! You know - i wouldn't talk about subtleties of filesystems here.
And then - at least half a dozen of programmers with altogether maybe many decades of experience, full-time internet access (talking about you :) required two months to get around the Apple bug. A simple "man 2 fsync" would have sufficed for a starter! And how ridiculous - i've spend the free time of four days :-( It's a shame, it's a bug, i wouldn't it let pass through to users.
Apple? No, i'm looking forward to return to a private BSD/Linux X/ahwm/aterm/vim laptop.
On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:
This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees. With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.
I don't see it that way for multiple reasons.
One of it is that Python states that it "is a beginner language", and the sentence i've quoted a part of in my last message is very beginner friendly. Now i don't think you can require that much further knowledge from a beginner at all than what is described in that very sentence.
(Just think of an african kid with a sponsored laptop and possibly 10 minutes internet access per day which tries hard to rise up and puts *galaxies of trust* in what it reads in os.rst, maybe because it is the only documentation he has available!)
Then Python is a high-level language. It's true that there are many shallow wrappers around lower-level functions, but still. It is cross-platform. Just have a look into a, say, file I/O implementation of a C library/program which aims to be portable. If all that #if#endif would be necessary in a high-level language, then why don't write a Makefile and compile that thing? Not that much more work, then - many thanks to the GCC people! You know - i wouldn't talk about subtleties of filesystems here.
And then - at least half a dozen of programmers with altogether maybe many decades of experience, full-time internet access (talking about you :) required two months to get around the Apple bug. A simple "man 2 fsync" would have sufficed for a starter! And how ridiculous - i've spend the free time of four days :-( It's a shame, it's a bug, i wouldn't it let pass through to users.
Apple? No, i'm looking forward to return to a private BSD/Linux X/ahwm/aterm/vim laptop.
8-} The real message was:
On Wed, Apr 20, 2011 at 12:16:39PM +0000, Antoine Pitrou wrote:
If Apple thinks [...] there's no reason for us
Apple thinks (KernelProgramming.pdf, BPFileSystem.pdf):
Kernel code must be nearly perfect.
Kernel programming is a black art that should be avoided if at
all possible. Fortunately, kernel programming is usually
unnecessary. You can write most software entirely in user
space.
Sparse files and zero filling. UFS supports sparse files,
which are a way for the file system to store the data in files
without storing unused space allocated for those files. HFS+
does not support sparse files and, in fact, zero-fills all
bytes allocated for a file until end-of-file.
[Steffen thinks HFS+ is driven from "user-space" through IOKit.]
POSIX says:
It would also not be unreasonable to omit testing for fsync(),
allowing it to be treated as a quality-of-implementation
issue.
It seems someone needs a hand. Nice afternoon with an untrashed INBOX.
Ronald Oussoren wrote:
Adding the F_FULLSYNC option to os.fsync would be fine though
To show you that i'm not unteachable i'll attach a patch which does that. This approach can be heavily extended, then, e.g. by using sync_file_range(all_the_flags) on Linux if the "full_fsync" argument is true?! (And then there is sync_file_range2() on PowerPC and nice ARM...)
(The first time that i use PyArg_xy plus plus - please review. test_os is ok and using "os.fsync(xy, True)" makes test_zlib succeed on unpatched hg checkout. os.rst change is really ugly.)
I'm convinced. And i've rewritten the patch. Now fsync(2) is always called first *regardless* of the new optional argument. The documentation is (a little bit) better now. And i've added support for NetBSD, AIX and Linux; though:
for AIX i'm testing (O_SYNC && O_DSYNC) which is almost definitely the wrong test for this, but (a) i don't know when fsync_range() has been introduced (seems to be many years) and (b) i've never really worked on AIX. (I only have the documentation: http://www.filibeto.org/unix/aix/lib/rel/5.3/basetrf1.pdf.)
i've added sync_file_range() on Linux because of SYNC_FILE_RANGE_WAIT_AFTER in the hope that it means something even though the manual page says something else - but Linux and documentation is something by itself. http://lwn.net/Articles/178199/ states
Providing all three flags guarantees that those pages are actually on disk when the call returns.
it seems OpenBSD, FreeBSD, Solaris 11 and HP/UX provide data reliability through fsync(2) alone, see e.g. the notes near EOF of the following Solaris 11 file:
P.S.: dito Linux and NetBSD. I've reused preprocessor tests we've discovered internally over the past years, but i cannot test this here. I could test Linux next Tuesday, though.
11877.2.diff: On Mac OS X, fsync(fd, full_sync=True); fsync(fd) do a full sync twice. You should restore the old file flags at fsync() exit. Or this surprising behaviour should be documented.
Ok, 11877.3.diff uses either-or.
I'm -10 on sync_file_range on Linux:
Charles-Francois Natali wrote:
I'm -10 on sync_file_range on Linux: [...] last time I checked [...]
I just looked at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=fs/sync.c;h=c38ec163da6ccba00a0146c75606c1b548b31343;hb=HEAD and it seems - as far as i understand what i read - that you're still right; and, furthermore, that fsync() does everything anyway. (But here an idiot is talking about *very* complicated stuff.)
I've also "search"ed for the called filemap_write_and_wait_range() and found the commit message for 2daea67e966dc0c42067ebea015ddac6834cef88 as part of that; very interesting in respect to our issue here.
I will wait before i update the patch though, just in case some experienced NetBSD or AIX user posts some message. For OpenBSD i think i can confirm that fsync(2) alone is enough after taking a (shallow, all shallow) look into kernel/vfs_syscalls.c and ufs/ffs/{ffs_softdep.c,softdep.h}.
and it seems - as far as i understand what i read - that you're still right; and, furthermore, that fsync() does everything anyway. (But here an idiot is talking about *very* complicated stuff.)
I just double-checked, and indeed, fsync does flush the disk cache when barriers are enabled on several FS, while sync_file_range does not. So sync_file_range should definitely not be used on Linux.
I'll attach 11877.4.diff:
I'll attach 11877.4.diff
A couple comments:
static PyObject *
posix_fsync(PyObject *self, PyObject *args, PyObject *kwargs)
{
PyObject *retval = NULL;
auto PyObject *fdobj;
auto int full_fsync = 1;
Why are you using the "auto" storage class specifier? I know that "explicit is better than implicit", but I really can't see a good reason for using it here (and anywhere, see http://c-faq.com/decl/auto.html).
# ifdef __APPLE__
res = fcntl(fd, F_FULLFSYNC);
# endif
# ifdef __NetBSD__
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
# endif
Since __APPLE and __NetBSD are exclusive, you could use something like # if defined(APPLE) res = fcntl(fd, F_FULLFSYNC); # elif defined(NetBSD) res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0); # endif
Do you really need to use goto for such a simple code?
On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:
I really can't see a good reason for using it here (and anywhere, see http://c-faq.com/decl/auto.html).
You're right.
Why are you using the "auto" storage class specifier? I know that "explicit is better than implicit"
Yup. I'm doing what is happening for real in (x86) assembler. Thus "auto" means (at a glance!) that this one will need space on the stack. (Conversely i'm not using "register" because who knows if that is really true? ;))
Do you really need to use goto for such a simple code?
Yup. Ok, this is more complicated. The reason is that my funs have exactly one entry point and exactly one place where they're left. This is because we here do manual instrumentalisation as in
ret fun(args) { locals s_NYD_ENTER;
assertions
...
jleave: s_NYD_LEAVE; return;
[maybe only, if large: possibly "predict-false" code blocks]
[possible error jumps here goto jleave;] }
We're prowd of that. N(ot)Y(et)D(ead) is actually pretty cool, i've used debuggers exactly 5 times in the past about ten years! I don't even know exactly *how to use debuggers*. 8---} (NYD can do backtracing, or, with NDEBUG and optional, profiling.) A really good optimizing compiler will produce the very same code! And i love nasm, but it's pretty non-portable. But C is also nice.
But of course i can change this (in C) to simply use return, this is easy here, no resources to be freed.
Thanks for looking at this, by the way. :)
On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:
ifdef __APPLE__
res = fcntl(fd, F_FULLFSYNC);
endif
ifdef __NetBSD__
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
endif
Since __APPLE and __NetBSD are exclusive, you could use something like
if defined(APPLE)
res = fcntl(fd, F_FULLFSYNC);
elif defined(NetBSD)
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
endif
Yes, you're right, i'll update the patch accordingly.
-- Steffen, sdaoden(*)(gmail.com)
11877.5.diff incorporates all changes suggested by Charles-François except for the 'auto' keyword, which is extremely important and which would need to be invented if it would not yet exist.
I'm dropping the old stuff. And i think this is the final version of the patch. I've changed the default argument to 'True' as i really think it's better to be on the safe side here. (The french are better off doing some gracy and dangerous sports to discover edges of life!) I'm still of the opinion that this should be completely hidden, but since it's completely transparent wether a Python function gets yet another named argument or not...
So, thanks to Ronald, i detected that also NetBSD introduced a FDISKSYNC flag in 2005 and that you really need fsync_range() there (at least by definition)! How could they do that? But i'm also happy to see that all other systems try hard to achieve security transparently and by default and unless i missed something once again.
Steffen, I don't understand your comment about "auto". Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code.
Ronald Oussoren wrote (2011-05-08 10:33+0200):
Steffen, I don't understand your comment about "auto". Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code.
Well, as long as i can keep my underwear all is fine. (I also looked at Google translate because i first wanted to start the reply with "croak .. pip .. twist .. wrench .. groan .. ugh".)
Steffen, you changed the default to doing a "full sync" in your last patch: while I was favoring that initially, I now agree with Ronald and Antoine, and think that we shouldn't change the default behaviour. The reason being that Apple and NetBSD folks should know what they're doing, and that there might be some subtle side effects (see for example my comment on sync_file_range on Linux). Also, given how many bugs you seem to encouter on OS-X, it's probably better to stick to a syscall known to be safe instead of silently replacing it by a maybe-not-so-tested syscall. Finally, depending on the workload, it could have a significant performance impact. People requiring write durability will probably manage to find this full_sync parameter.
I don't agree with you and i don't believe it is implemented like that. But it seems i am the only one on this issue who sees it like that. Thus i apply 11877.6.diff.
Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code
Changed.
Steffen, you changed the default to doing a "full sync" in your last patch
Changed all through.
The reason being that Apple and NetBSD folks should know what they're doing [.] People requiring write durability will probably manage to find this full_sync parameter.
This sounds logical. I've changed the doc in os.rst so that it includes a note on *which* operating systems actually do something dependend on that argument. I should have done that from the very beginning.
Finally, depending on the workload, it could have a significant performance impact.
:-)
Ouch, ouch, ouch!! I'll have to send 11877.7.diff which extends 11877.6.diff. This is necessary because using fcntl(2) with F_FULLFSYNC may fail with ENOTTY (inapprobiate ioctl for device) in situations where a normal fsync(2) succeeds (e.g. STDOUT_FILENO). By the way - i have no idea of Redmoondian Horror at all (except for http://msdn.microsoft.com/en-us/sync/bb887623.aspx).
Dropping .5 and .6 - and sorry for the noise. Good night, Europe.
Just adding more notes on that by reactivating one of haypo's links from bpo-8604. (And: maybe some Linux documentation should be updated?) From Theodore Ts'o, http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don’t-fear-fsync:
As the Eat My Data presentation points out very clearly, the only
safe way according that POSIX allows for requesting data written
to a particular file descriptor be safely stored on stable storage
is via the fsync() call. Linux’s close(2) man page makes this
point very clearly:
A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It is not
common for a file system to flush the buffers when the stream is
closed. If you need to be sure that the data is physically stored
use fsync(2).
Why don’t application programmers follow these sage words? These
three reasons are most often given as excuses:
- (Perceived) performance problems with fsync()
- The application only needs atomicity, but not durability
- The fsync() causing the hard drive to spin up unnecessarily in
laptop_mode
(Don't ask me why i'm adding this note though. I should have searched for it once i've opened that issue? Bah!!! Ts'o did not write that article for me. He'd better hacked.)
Calling fsync on a file descriptor referring to a tty doesn't make much sense.
On Linux, this fails with EINVAL:
$ python -c 'import os; os.fsync(1)'
Traceback (most recent call last):
File "<string>", line 1, in <module>
OSError: [Errno 22] Invalid argument
So if the full sync fails on ttys, it shouldn't be a problem: since the default performs a classical fsync, this won't break compatibility with existing code anyway. So I think you should stick with the previous version (well, if the full sync fails on other FDs, then it's another story, but in that case it should just be dropped altogether if it's not reliable...).
By the way, it's "appropriate", not "approbiate". You made the same typo in your patch.
[.]
OSError: [Errno 22] Invalid argument
Sorry, i didn't know that. Mac OS X (2.5 and 2.6 Apple shipped):
21:43 \~/tmp $ python2.5 -c 'import os; os.fsync(1)'; echo $? 0 21:43 \~/tmp $ python2.6 -c 'import os; os.fsync(1)'; echo $? 0 21:43 \~/tmp $ python2.7 -c 'import os; os.fsync(1)'; echo $? 0 21:43 \~/tmp $ python3 -c 'import os; os.fsync(1)'; echo $? 0
So I think you should stick with the previous version (well, if the full sync fails on other FDs, then it's another story, but in that case it should just be dropped altogether if it's not reliable...).
Strong stuff. *This* is the version which should have been implemented from the beginning, but Apple states
F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems.
and i thought
But who knows if that fcntl will fail on some non-noted fsys? I think it's better to be on the safe side. Quoting you, Charles-François
People requiring write durability will probably manage to find this full_sync parameter and if they do they thus really strive for data integrity, so call fsync() as a fallback for the security which Apple provides. Also: we cannot let os.fsync() fail with ENOTTY!?
By the way, it's "appropriate", not "approbiate". You made the same typo in your patch.
8\~I That was not a typo. Thanks. I'll changed that.
and if they do they thus really strive for data integrity, so call fsync() as a fallback for the security which Apple provides.
Why? If I ask a full sync and it fails, I'd rather have an error returned so that I can take the appropriate decision (abort, roll-back, try a standard fsync) rather than have Python silently replace it by an fsync.
Also: we cannot let os.fsync() fail with ENOTTY!?
Why not, since that's what the kernel returns? Once again, since the default behaviour doesn't change, this won't break any existing application.
Finally, depending on the workload, it could have a significant performance impact.
Oh yes (first replaces os.fsync() with an in-python wrapper):
18:12 \~/tmp $ ll mail ls: mail: No such file or directory 18:12 \~/tmp $ ll X-MAIL 312 -rw-r----- 1 steffen staff 315963 15 May 17:49 X-MAIL 18:12 \~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL Dispatched 37 tickets to 4 targets.
real 0m4.638s user 0m0.974s sys 0m0.160s 18:13 \~/tmp $ rm -rf mail 18:13 \~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL Dispatched 37 tickets to 4 targets.
real 0m1.228s user 0m0.976s sys 0m0.122s
(I'm using the first one.)
(I'm not sure Rietveld sent the message so I post it here, sorry in case of duplicate).
Steffen, I've made a quick review of your patch, in case you're interested. I think that this functionality can be really useful to some people, and it'd be nice if your patch could stabilize somewhat so that committers can review it properly and eventually merge it.
Concerning your benchmark: I don't know exactly what you're measuring, but when performing I/O-related benchmarks, it's always a good idea to run each test several times in a row, or flush the page/buffer cache before each run: the reason is that the the second run benefits from the page/buffer cache, which often speeds things up dramatically. Example:
# time find /lib -type f -exec cat {} \; > /dev/null real 0m20.455s user 0m8.145s sys 0m5.256s
# time find /lib -type f -exec cat {} \; > /dev/null real 0m6.827s user 0m8.477s sys 0m4.804s
# echo 3 > /proc/sys/vm/drop_caches
# time find /lib -type f -exec cat {} \; > /dev/null real 0m19.954s user 0m8.069s sys 0m5.364s
Thank you, thank you, thank you. I'm a bit irritated that a french man treats a "wet-her" as a typo! What if *i* like it?? In fact it is a fantastic physical backing store. Unbeatable.
Well and about dropping the fsync() in case the fcntl() fails with ENOTTY. This is "\<Esc>2dd", which shouldn't hurt a committer. I'm convinced that full_fsync=False is optional and false by default, but i don't trust Apple. I've seen a reference to an "atomic file" somewhere on bugs.python.org and that does fsync() first followed by fcntl() if FULLFSYNC is available. Thus, if someone knows about that, she may do so, but otherwise i would guess he doesn't, and in that case i would not expect ENOTTY from an fsync() - still i want a full flush! This is what NetBSD describes:
NOTES For optimal efficiency, the fsync_range() call requires that the file system containing the file referenced by fd support partial synchronization of file data. For file systems which do not support partial synchronization, the entire file will be synchronized and the call will be the equivalent of calling fsync().
But Apple is *soooo*speeeeciaaaal* again. Happy birthday.
I've dropped wet-her! I hope now you're satisfied!!!!! So the buffer cache is all which remains hot. How deserted!
And you could also add a test (I guess that just calling fsync with full_sync=True on a valid FD would be enough.
I was able to add two tests as an extension to what is yet tested about os.fsync(), but that uses an invalid fd. (At least it enters the conditional and fails as expected.)
I'm not sure static is necessary, I'd rather make it const.
Yes..
This code is correct as it is, see other extension modules in the stdlib for other examples of this pattern
..but i've used copy+paste here.
And you could also add a test (I guess that just calling fsync with full_sync=True on a valid FD would be enough.
The alternative would be that full_sync
Ok, i've renamed full_fsync to full_sync.
I've dropped wet-her! I hope now you're satisfied!!!!! So the buffer cache is all which remains hot. How deserted!
I'm not sure I'm always understanding your messages well (I'm not a native English speaker), but I don't think this kind of joke is appropriate for the bug tracker. Thank you.
I don't like the API because it gives a different behaviour depending on the OS and I don't see how to check that the function does really a full sync.
I would prefer a new option os.fullsync() function which is like your os.fsync(fd, full_sync=False), except that the function doesn't exist if the OS doesn't implement it.
Excusing myself seems to be the only "probates Mittel". @Antoine Pitrou: It was a real shame to read your mail. (It's sometimes so loud that "i don't even hear what i write".)
See also issue bpo-12102.
diff --git a/Doc/library/os.rst b/Doc/library/os.rst --- a/Doc/library/os.rst +++ b/Doc/library/os.rst @@ -810,6 +810,35 @@ Availability: Unix, and Windows.
+.. function:: fullfsync(fd) + + The POSIX standart requires that :c:func:`fsync` must transfer the buffered + data to the storage device, not that the data is actually written by the + device itself. It explicitely leaves it up to operating system implementors + whether users are given stronger guarantees on data integrity or not. Some + systems also offer special functions which overtake the part of making such + stronger guarantees, i.e., Mac OS X and NetBSD. + + This non-standart function is *optionally made available to access such + special functionality when feasible. It will force write of file buffers to + disk and the flush of disk caches of the file given by file descriptor *fd. + To strive for best-possible data integrity, the following can be done:: + + # Force writeout of local buffer modifications + f.flush() + # Then synchronize the changes to physical backing store + if hasattr(os, 'fullfsync'): + os.fullfsync(f.fileno()) + elif hasattr(os, 'fsync'): + os.fsync(f.fileno()) + + ..note:: + Calling this function may take a long time, since it may block + until the disk reports that the transfer has been completed. + + Availability: Unix. + + .. function:: ftruncate(fd, length)
Truncate the file corresponding to file descriptor \*fd*, so that it is at most
diff --git a/Lib/test/test_os.py b/Lib/test/test_os.py --- a/Lib/test/test_os.py +++ b/Lib/test/test_os.py @@ -835,12 +835,12 @@
class TestInvalidFD(unittest.TestCase):
singles = ["fchdir", "dup", "fdopen", "fdatasync", "fstat",
- "fstatvfs", "fsync", "tcgetpgrp", "ttyname"]
+ "fstatvfs", "fsync", "fullfsync", "tcgetpgrp", "ttyname"]
#singles.append("close")
- #We omit close because it doesn'r raise an exception on some platforms
+ # We omit close because it doesn't raise an exception on some platforms
def get_single(f):
def helper(self):
- if hasattr(os, f):
+ if hasattr(os, f):
self.check(getattr(os, f))
return helper
for f in singles:
diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -174,6 +174,11 @@
#endif /* ! __IBMC__ */
#ifndef _MSC_VER
+ /* os.fullfsync()? */
+# if (defined HAVE_FSYNC && ((defined __APPLE__ && defined F_FULLFSYNC) || \
+ (defined __NetBSD__ && defined FDISKSYNC)))
+# define PROVIDE_FULLFSYNC
+# endif
#if defined(__sgi)&&_COMPILER_VERSION>=700
/* declare ctermid_r if compiling with MIPSPro 7.x in ANSI C mode
@@ -2129,6 +2134,41 @@
{
return posix_fildes(fdobj, fsync);
}
+
+# ifdef PROVIDE_FULLFSYNC
+PyDoc_STRVAR(fullfsync__doc__,
+"fullfsync(fd)\n\n"
+"force write of file buffers to disk, and the flush of disk caches\n"
+"of the file given by file descriptor fd.");
+
+static PyObject *
+fullfsync(PyObject *self, PyObject *fdobj)
+{
+ /* See issue 11877 discussion */
+ int res, fd = PyObject_AsFileDescriptor(fdobj);
+ if (fd < 0)
+ return NULL;
+ if (!_PyVerify_fd(fd))
+ return posix_error();
+
+ Py_BEGIN_ALLOW_THREADS
+# if defined __APPLE__
+ /* F_FULLFSYNC is not supported for all types of FDs/FSYSs;
+ * be on the safe side and test for inappropriate ioctl errors */
+ res = fcntl(fd, F_FULLFSYNC);
+ if (res < 0 && errno == ENOTTY)
+ res = fsync(fd);
+# elif defined __NetBSD__
+ res = fsync_range(fd, FFILESYNC | FDISKSYNC, 0, 0);
+# endif
+ Py_END_ALLOW_THREADS
+
+ if (res < 0)
+ return posix_error();
+ Py_INCREF(Py_None);
+ return Py_None;
+}
+# endif /* PROVIDE_FULLFSYNC */
#endif /* HAVE_FSYNC */
#ifdef HAVE_SYNC
@@ -9473,6 +9513,9 @@
#endif
#ifdef HAVE_FSYNC
{"fsync", posix_fsync, METH_O, posix_fsync__doc__},
+# ifdef PROVIDE_FULLFSYNC
+ {"fullfsync", fullfsync, METH_O, fullfsync__doc__},
+# endif
#endif
#ifdef HAVE_SYNC
{"sync", posix_sync, METH_NOARGS, posix_sync__doc__},
(This was an attachment to an empty mail message.)
Trying to revive this issue, here's a comment I left on Rietveld:
"""
I don't agree, the documentation states that full_sync will cause a flush to stable storage where supported and a plain fsync where it isn't. This code does just that.
I think I wasn't clear enough, so let me rephrase: the call to fcntl is guarded by #ifdef, so it is only called on platforms where it is supported, otherwise fsync() is called. What I was referring to is the fact that an fsync is tried if the call to fcntl fails with ENOTTY: if I ask my operating system to flush disk buffers and it can't, then I'd rather have an error returned, so that I can take the right decision (abort, roll-back, try a classical fsync). IMHO, silently masking an error by falling back to another syscall is wrong. """
Anyway, I think this patch can be useful (see for example issue bpo-8604 as a typical use case). I personally don't care since I use Linux, but OS-X and *BSD folks might find some interest in this.
Here is something unsorted and loose:
@neologix: One could argue that something had happened before the fsync(2), so that code which blindly did so is too dumb to do any right decision anyway. Even PEP-3151 won't help.
I favour haypos fullsync() approach, because that could probably make it into it. Yet Python doesn't offer any possibility to access NetBSD DISKSYNC stuff sofar.
Currently the programmer must be aware of any platform specific problems. I, for example, am not aware of Windows. How can i give any guarantee to users which (will) use my S-Postman on Windows? I need to put trust in turn into the Framework i am using - Python. And that makes me feel pretty breathless.
Fortunately Python is dynamic, so that one simply can replace os.fsync(). Works once only though (think signal handlers :=).
+ That is indeed the solution i'm using for my S-Postman, because *only* like this i can actually make Python's mailbox.py module reliable on Mac OS X! I can't give any guarantee for NetBSD, though i document it!
+ Aaarrg! I'm a liar!! I lie about - data integrity!!!
--Steffen Ciao, sdaoden(*)(gmail.com) () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
One could argue that something had happened before the fsync(2), so that code which blindly did so is too dumb to do any right decision anyway. Even PEP-3151 won't help.
I don't understand. If the syscall supposed to flush the disk's buffer cache fails - be it fcntl() or sync_file_range() - I think the error should be propagated, not silently ignored and replaced with another syscall which doesn't have the same semantic. That's all.
- I favour haypos fullsync() approach, because that could probably make it into it. Yet Python doesn't offer any possibility to access NetBSD DISKSYNC stuff sofar.
Are you willing to update your patch accordingly?
- Currently the programmer must be aware of any platform specific problems. I, for example, am not aware of Windows. How can i give any guarantee to users which (will) use my S-Postman on Windows? I need to put trust in turn into the Framework i am using - Python. And that makes me feel pretty breathless.
I agree.
- Fortunately Python is dynamic, so that one simply can replace os.fsync(). Works once only though (think signal handlers :=).
Yes, but since it's a fairly reasonable and useful feature, I think it could be merged.
- Aaarrg! I'm a liar!! I lie about - data integrity!!!
Well, actually, some hard disks lie about this too :-)
> Even PEP-3151 won't help.
I don't understand. If the syscall supposed to flush the disk's buffer cache fails - be it fcntl() or sync_file_range() - I think the error should be propagated, not silently ignored and replaced with another syscall which doesn't have the same semantic. That's all.
I'm with you theoretically - of course errors should be propagated to users so that they can act accordingly. And this is exactly what the patches do, *unless* it is an error which is not produced by the native fsync(2) call:
-- >8 --
?0%0[steffen@sherwood tmp]$ cat t.c
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(void) {
int r = fcntl(2, F_FULLFSYNC);
fprintf(stderr, "1. %d: %d, %s\n", r, errno, strerror(errno));
errno = 0;
r = fsync(2);
fprintf(stderr, "2. %d: %d, %s\n", r, errno, strerror(errno));
return 0;
}
?0%0[steffen@sherwood tmp]$ gcc -o t t.c && ./t
1. -1: 25, Inappropriate ioctl for device
2. 0: 0, Unknown error: 0
?0%0[steffen@sherwood tmp]$ grep -F 25 /usr/include/sys/errno.h
#define ENOTTY 25 /* Inappropriate ioctl for device */
-- >8 --
So in fact the patches do what is necessary to make the changed version act just as the plain systemcall.
> - I favour haypos fullsync() approach Are you willing to update your patch accordingly?
Both patches still apply onto the tip of friday noon: http://bugs.python.org/file22016/11877.9.diff, http://bugs.python.org/file22046/11877-standalone.1.diff.
Again: i personally would favour os.fsync(fd, fullsync=True), because that is the only way to put reliability onto unaware facilities unaware (e.g. my S-Postman replaces os.fsync() with a special function so that reliability is injected in- and onto Python's mailbox.py, which calls plain os.fsync()), but i've been convinced that this is impossible to do. It seems to be impossible to change os.fsync() at all, because it has a standartized function prototype.
So what do you mean? Shall i rewrite 11877-standalone.1.diff to always offer fullsync() whenever there is fsync()? This sounds to be a useful change, because testing hasattr() of the one would imply availability of the other.
> + Aaarrg! I'm a liar!! I lie about - data integrity!!! Well, actually, some hard disks lie about this too :-)
Yeah. But hey: "I feel save in New York City. I feel save in New York City." Nice weekend - and may the juice be with you!
--Steffen Ciao, sdaoden(*)(gmail.com) ASCII ribbon campaign ( ) More nuclear fission plants against HTML e-mail X can serve more coloured and proprietary attachments / \ and sounding animations
Are you willing to update your patch accordingly?
I'm a vain rooster! I've used fullfsync() in 11877-standalone.1.diff! Sorry for that, haypo. :-/
11877.fullsync-1.diff uses fullsync() and will instead always be provided when fsync() is available, to which it will fall back if no special operating system functionality is available.
I really think this is the cleanest solution, because like this a user can state "i want the strongest guarantees available on data integrity", and Python does just that.
--Steffen Ciao, sdaoden(*)(gmail.com) ASCII ribbon campaign ( ) More nuclear fission plants against HTML e-mail X can serve more coloured and proprietary attachments / \ and sounding animations
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['library']
title = 'Change os.fsync() to support physical backing store syncs'
updated_at =
user = 'https://bugs.python.org/sdaoden'
```
bugs.python.org fields:
```python
activity =
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'sdaoden'
dependencies = []
files = ['22016', '22046', '22759']
hgrepos = []
issue_num = 11877
keywords = ['patch']
message_count = 51.0
messages = ['134043', '134063', '134071', '134084', '134142', '134143', '134144', '134145', '134147', '134148', '134151', '134154', '134179', '134213', '134214', '134215', '134221', '134236', '134252', '134253', '134379', '135453', '135457', '135458', '135462', '135520', '135573', '135655', '135699', '135803', '135824', '135847', '135862', '135865', '135870', '136037', '136048', '136158', '136175', '136177', '136180', '136214', '136219', '136431', '136432', '140608', '140657', '140920', '140982', '141121', '189331']
nosy_count = 7.0
nosy_names = ['ronaldoussoren', 'pitrou', 'vstinner', 'nadeem.vawda', 'neologix', 'santoso.wijaya', 'sdaoden']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue11877'
versions = ['Python 3.3']
```