Change os.fsync() to support physical backing store syncs

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

BPO	11877
Nosy	@ronaldoussoren, @pitrou, @vstinner
Files	11877.9.diff 11877-standalone.1.diff 11877.fullsync-1.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['library'] title = 'Change os.fsync() to support physical backing store syncs' updated_at = user = 'https://bugs.python.org/sdaoden' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'sdaoden' dependencies = [] files = ['22016', '22046', '22759'] hgrepos = [] issue_num = 11877 keywords = ['patch'] message_count = 51.0 messages = ['134043', '134063', '134071', '134084', '134142', '134143', '134144', '134145', '134147', '134148', '134151', '134154', '134179', '134213', '134214', '134215', '134221', '134236', '134252', '134253', '134379', '135453', '135457', '135458', '135462', '135520', '135573', '135655', '135699', '135803', '135824', '135847', '135862', '135865', '135870', '136037', '136048', '136158', '136175', '136177', '136180', '136214', '136219', '136431', '136432', '140608', '140657', '140920', '140982', '141121', '189331'] nosy_count = 7.0 nosy_names = ['ronaldoussoren', 'pitrou', 'vstinner', 'nadeem.vawda', 'neologix', 'santoso.wijaya', 'sdaoden'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue11877' versions = ['Python 3.3'] ```

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

bpo-11277 revealed that Mac OS X fsync() does not always and gracefully synchronize file data to physical backing store. To gain real fsync(2) behaviour fcntl(2) must be called with the F_FULLFSYNC flag.

vstinner commented 13 years ago

Oh, it remembers a long story around ext3/ext4 and write barrier, with a specific problem in Firefox with SQLite.

http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/fsync.2.html " For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLF- SYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail."

http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ "fsync on Mac OS X: Since on Mac OS X the fsync command does not make the guarantee that bytes are written, SQLite sends a F_FULLFSYNC request to the kernel to ensures that the bytes are actually written through to the drive platter."

http://lwn.net/Articles/283161/

http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-September/001356.html " OTP-7471 On Mac OS X, file:sync/1 now guarantees that all filesystem buffers are written to the disk by using the fcntl() with F_FULLFSYNC option. Previously, file:sync/1 called fsync(), which only guaranteed that the data had been transferred to the disk drive. (Thanks to Jan Lehnardt.)"

http://lists.mindrot.org/pipermail/portawiki-discuss/2005-November/000002.html

ronaldoussoren commented 13 years ago

Traditionally the functions in os are thin wrappers around OS functions, this patch changes that for os.fsync.

Two nits wrt. the patch itself: 1) The behaviour on OSX should be documented in the stdlib reference, that currently says that os.fsync will call the C function fsync

2) The comment in the __APPLE__ case can be clearer, and should explain the issue (as short as possible).

I'm -1 on merging this patch though, as the fsync function on OSX behaves just like the one on Linux, the fnctl option adds a stronger guarantee in that it forces a flush of the buffers on the storage device as well. The manpage for fsync on a linux host says:

In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.

Adding the F_FULLSYNC option to os.fsync would be fine though (if it isn't already implemented)

The linux

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

I know that POSIX makes no guarantee regarding durable writes, but IMHO that's definitely wrong, in the sense that when one calls fsync, he expects the data to be committed to disk and be durable. Fixing this deficiency through Python's exposed fsync might thus be a good idea (for example, the Window version probably doesn't call fsync, so it's already not a direct mapping).

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

when one calls fsync, he expects [...] Fixing this deficiency through Python's exposed fsync [...]

I think so, too. http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html even permits "null implementation"s etc. etc. etc. (I like the last paragraph of "Rationale" the most.) os.rst states for fsync():

[...] to ensure that all internal buffers associated with *f*
are written to disk.

If a platform offers the opportunity to actually implement the desired behaviour then i would do so, regardless of what needs to be done internally to achieve it.

(And the single question on apple is simply what to do otherwise with that VMS/VFS bug for at least large sparse files. I can only imagine adding multiple notes in the documentation, here and there.)

ronaldoussoren commented 13 years ago

See \http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html\ for linux' behavior, in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.

This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees.

With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.

BTW. The need to use F_FULLSYNC in bpo-11277 might be a bug in OSX itself, have you filed an issue in Apple's tracker?

vstinner commented 13 years ago

A different approach for this issue is to document fcntl.fcntl(fd, fcntl.F_FULLFSYNC) in fsync() documentation.

pitrou commented 13 years ago

Agreed with Ronald. If Apple thinks fsync() shouldn't flush the hardware cache, there's no reason for us to override that.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.

Barriers are now enable by default in EXT4, and Theodore Tso has been favourable to that for quite some time now: http://lwn.net/Articles/283288/

As for OS-X, this is definitely a bug (I mean, having to call fsync before mmap is a huge bug in itself).

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:

This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees. With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.

I don't see it that way for multiple reasons.

One of it is that Python states that it "is a beginner language", and the sentence i've quoted a part of in my last message is very beginner friendly. Now i don't think you can require that much further knowledge from a beginner at all than what is described in that very sentence.

(Just think of an african kid with a sponsored laptop and possibly 10 minutes internet access per day which tries hard to rise up and puts *galaxies of trust* in what it reads in os.rst, maybe because it is the only documentation he has available!)

Then Python is a high-level language. It's true that there are many shallow wrappers around lower-level functions, but still. It is cross-platform. Just have a look into a, say, file I/O implementation of a C library/program which aims to be portable. If all that #if#endif would be necessary in a high-level language, then why don't write a Makefile and compile that thing? Not that much more work, then - many thanks to the GCC people! You know - i wouldn't talk about subtleties of filesystems here.

And then - at least half a dozen of programmers with altogether maybe many decades of experience, full-time internet access (talking about you :) required two months to get around the Apple bug. A simple "man 2 fsync" would have sufficed for a starter! And how ridiculous - i've spend the free time of four days :-( It's a shame, it's a bug, i wouldn't it let pass through to users.

Apple? No, i'm looking forward to return to a private BSD/Linux X/ahwm/aterm/vim laptop.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:

This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees. With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.

I don't see it that way for multiple reasons.

One of it is that Python states that it "is a beginner language", and the sentence i've quoted a part of in my last message is very beginner friendly. Now i don't think you can require that much further knowledge from a beginner at all than what is described in that very sentence.

(Just think of an african kid with a sponsored laptop and possibly 10 minutes internet access per day which tries hard to rise up and puts *galaxies of trust* in what it reads in os.rst, maybe because it is the only documentation he has available!)

Then Python is a high-level language. It's true that there are many shallow wrappers around lower-level functions, but still. It is cross-platform. Just have a look into a, say, file I/O implementation of a C library/program which aims to be portable. If all that #if#endif would be necessary in a high-level language, then why don't write a Makefile and compile that thing? Not that much more work, then - many thanks to the GCC people! You know - i wouldn't talk about subtleties of filesystems here.

And then - at least half a dozen of programmers with altogether maybe many decades of experience, full-time internet access (talking about you :) required two months to get around the Apple bug. A simple "man 2 fsync" would have sufficed for a starter! And how ridiculous - i've spend the free time of four days :-( It's a shame, it's a bug, i wouldn't it let pass through to users.

Apple? No, i'm looking forward to return to a private BSD/Linux X/ahwm/aterm/vim laptop.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

8-} The real message was:

On Wed, Apr 20, 2011 at 12:16:39PM +0000, Antoine Pitrou wrote:

If Apple thinks [...] there's no reason for us

Apple thinks (KernelProgramming.pdf, BPFileSystem.pdf):

Kernel code must be nearly perfect.

Kernel programming is a black art that should be avoided if at
all possible. Fortunately, kernel programming is usually
unnecessary.  You can write most software entirely in user
space.

Sparse files and zero filling. UFS supports sparse files,
which are a way for the file system to store the data in files
without storing unused space allocated for those files. HFS+
does not support sparse files and, in fact, zero-fills all
bytes allocated for a file until end-of-file.
[Steffen thinks HFS+ is driven from "user-space" through IOKit.]

POSIX says:

It would also not be unreasonable to omit testing for fsync(),
allowing it to be treated as a quality-of-implementation
issue.

It seems someone needs a hand. Nice afternoon with an untrashed INBOX.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Ronald Oussoren wrote:

Adding the F_FULLSYNC option to os.fsync would be fine though

To show you that i'm not unteachable i'll attach a patch which does that. This approach can be heavily extended, then, e.g. by using sync_file_range(all_the_flags) on Linux if the "full_fsync" argument is true?! (And then there is sync_file_range2() on PowerPC and nice ARM...)

(The first time that i use PyArg_xy plus plus - please review. test_os is ok and using "os.fsync(xy, True)" makes test_zlib succeed on unpatched hg checkout. os.rst change is really ugly.)

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

I'm convinced. And i've rewritten the patch. Now fsync(2) is always called first *regardless* of the new optional argument. The documentation is (a little bit) better now. And i've added support for NetBSD, AIX and Linux; though:

for AIX i'm testing (O_SYNC && O_DSYNC) which is almost definitely the wrong test for this, but (a) i don't know when fsync_range() has been introduced (seems to be many years) and (b) i've never really worked on AIX. (I only have the documentation: http://www.filibeto.org/unix/aix/lib/rel/5.3/basetrf1.pdf.)
i've added sync_file_range() on Linux because of SYNC_FILE_RANGE_WAIT_AFTER in the hope that it means something even though the manual page says something else - but Linux and documentation is something by itself. http://lwn.net/Articles/178199/ states

Providing all three flags guarantees that those pages are actually on disk when the call returns.
it seems OpenBSD, FreeBSD, Solaris 11 and HP/UX provide data reliability through fsync(2) alone, see e.g. the notes near EOF of the following Solaris 11 file:

http://www.filibeto.org/sun/lib/solaris11-express-docs/2010.11/E19963_01/html/821-1464/fcntl.h-3head.html#fcntl.h-3head

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

P.S.: dito Linux and NetBSD. I've reused preprocessor tests we've discovered internally over the past years, but i cannot test this here. I could test Linux next Tuesday, though.

vstinner commented 13 years ago

11877.2.diff: On Mac OS X, fsync(fd, full_sync=True); fsync(fd) do a full sync twice. You should restore the old file flags at fsync() exit. Or this surprising behaviour should be documented.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Ok, 11877.3.diff uses either-or.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

I'm -10 on sync_file_range on Linux:

it doesn't update the file metadata, so there's a high chance of corruption after a crash
last time I checked, it didn't flush the disk cache (well, it probably does if barriers are enabled, but that's also the case with fsync)

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Charles-Francois Natali wrote:

I'm -10 on sync_file_range on Linux: [...] last time I checked [...]

I just looked at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=fs/sync.c;h=c38ec163da6ccba00a0146c75606c1b548b31343;hb=HEAD and it seems - as far as i understand what i read - that you're still right; and, furthermore, that fsync() does everything anyway. (But here an idiot is talking about *very* complicated stuff.)

I've also "search"ed for the called filemap_write_and_wait_range() and found the commit message for 2daea67e966dc0c42067ebea015ddac6834cef88 as part of that; very interesting in respect to our issue here.

I will wait before i update the patch though, just in case some experienced NetBSD or AIX user posts some message. For OpenBSD i think i can confirm that fsync(2) alone is enough after taking a (shallow, all shallow) look into kernel/vfs_syscalls.c and ufs/ffs/{ffs_softdep.c,softdep.h}.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

and it seems - as far as i understand what i read - that you're still right; and, furthermore, that fsync() does everything anyway. (But here an idiot is talking about *very* complicated stuff.)

I just double-checked, and indeed, fsync does flush the disk cache when barriers are enabled on several FS, while sync_file_range does not. So sync_file_range should definitely not be used on Linux.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

I'll attach 11877.4.diff:

Docu change (i hope to be better)
Kept NetBSD after looking into http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.423&content-type=text/x-cvsweb-markup&only_with_tag=MAIN because fsync_range() with FDISKSYNC set causes a FSYNC_CACHE flag to be passed through to VOP_FSYNC() in addition to FSYNC_WAIT, which is what plain fsync() passes exclusively. (I have not furtherly discovered that. FDISKSYNC was introduced
1. I believe existence of such a flag will sooner or later force somebody to make a difference in wether it is set or not.)
Drop of AIX. According to the systems manual (link in msg134213) there does not seem to be a difference in between fsync() and fsync_range()
Drop of Linux sync_file_range() (see messages above). By the way, does Linux still require

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

I'll attach 11877.4.diff

A couple comments:

static PyObject *
posix_fsync(PyObject *self, PyObject *args, PyObject *kwargs)
{
    PyObject *retval = NULL;
    auto PyObject *fdobj;
    auto int full_fsync = 1;

Why are you using the "auto" storage class specifier? I know that "explicit is better than implicit", but I really can't see a good reason for using it here (and anywhere, see http://c-faq.com/decl/auto.html).

# ifdef __APPLE__
    res = fcntl(fd, F_FULLFSYNC);
# endif
# ifdef __NetBSD__
    res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
# endif

Since __APPLE and __NetBSD are exclusive, you could use something like # if defined(APPLE) res = fcntl(fd, F_FULLFSYNC); # elif defined(NetBSD) res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0); # endif

Do you really need to use goto for such a simple code?

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:

I really can't see a good reason for using it here (and anywhere, see http://c-faq.com/decl/auto.html).

You're right.

Why are you using the "auto" storage class specifier? I know that "explicit is better than implicit"

Yup. I'm doing what is happening for real in (x86) assembler. Thus "auto" means (at a glance!) that this one will need space on the stack. (Conversely i'm not using "register" because who knows if that is really true? ;))

Do you really need to use goto for such a simple code?

Yup. Ok, this is more complicated. The reason is that my funs have exactly one entry point and exactly one place where they're left. This is because we here do manual instrumentalisation as in

ret fun(args) { locals s_NYD_ENTER;

assertions

...

jleave: s_NYD_LEAVE; return;

[maybe only, if large: possibly "predict-false" code blocks]

[possible error jumps here goto jleave;] }

We're prowd of that. N(ot)Y(et)D(ead) is actually pretty cool, i've used debuggers exactly 5 times in the past about ten years! I don't even know exactly *how to use debuggers*. 8---} (NYD can do backtracing, or, with NDEBUG and optional, profiling.) A really good optimizing compiler will produce the very same code! And i love nasm, but it's pretty non-portable. But C is also nice.

But of course i can change this (in C) to simply use return, this is easy here, no resources to be freed.

Thanks for looking at this, by the way. :)

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:

ifdef __APPLE__
res = fcntl(fd, F_FULLFSYNC);
endif

ifdef __NetBSD__
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
endif

Since __APPLE and __NetBSD are exclusive, you could use something like

if defined(APPLE)
res = fcntl(fd, F_FULLFSYNC);
elif defined(NetBSD)
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
endif

Yes, you're right, i'll update the patch accordingly.

-- Steffen, sdaoden(*)(gmail.com)

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

11877.5.diff incorporates all changes suggested by Charles-François except for the 'auto' keyword, which is extremely important and which would need to be invented if it would not yet exist.

I'm dropping the old stuff. And i think this is the final version of the patch. I've changed the default argument to 'True' as i really think it's better to be on the safe side here. (The french are better off doing some gracy and dangerous sports to discover edges of life!) I'm still of the opinion that this should be completely hidden, but since it's completely transparent wether a Python function gets yet another named argument or not...

So, thanks to Ronald, i detected that also NetBSD introduced a FDISKSYNC flag in 2005 and that you really need fsync_range() there (at least by definition)! How could they do that? But i'm also happy to see that all other systems try hard to achieve security transparently and by default and unless i missed something once again.

ronaldoussoren commented 13 years ago

Steffen, I don't understand your comment about "auto". Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Ronald Oussoren wrote (2011-05-08 10:33+0200):

Steffen, I don't understand your comment about "auto". Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code.

Well, as long as i can keep my underwear all is fine. (I also looked at Google translate because i first wanted to start the reply with "croak .. pip .. twist .. wrench .. groan .. ugh".)

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

Steffen, you changed the default to doing a "full sync" in your last patch: while I was favoring that initially, I now agree with Ronald and Antoine, and think that we shouldn't change the default behaviour. The reason being that Apple and NetBSD folks should know what they're doing, and that there might be some subtle side effects (see for example my comment on sync_file_range on Linux). Also, given how many bugs you seem to encouter on OS-X, it's probably better to stick to a syscall known to be safe instead of silently replacing it by a maybe-not-so-tested syscall. Finally, depending on the workload, it could have a significant performance impact. People requiring write durability will probably manage to find this full_sync parameter.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

I don't agree with you and i don't believe it is implemented like that. But it seems i am the only one on this issue who sees it like that. Thus i apply 11877.6.diff.

Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code

Changed.

Steffen, you changed the default to doing a "full sync" in your last patch

Changed all through.

The reason being that Apple and NetBSD folks should know what they're doing [.] People requiring write durability will probably manage to find this full_sync parameter.

This sounds logical. I've changed the doc in os.rst so that it includes a note on *which* operating systems actually do something dependend on that argument. I should have done that from the very beginning.

Finally, depending on the workload, it could have a significant performance impact.

:-)

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Ouch, ouch, ouch!! I'll have to send 11877.7.diff which extends 11877.6.diff. This is necessary because using fcntl(2) with F_FULLFSYNC may fail with ENOTTY (inapprobiate ioctl for device) in situations where a normal fsync(2) succeeds (e.g. STDOUT_FILENO). By the way - i have no idea of Redmoondian Horror at all (except for http://msdn.microsoft.com/en-us/sync/bb887623.aspx).

Dropping .5 and .6 - and sorry for the noise. Good night, Europe.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Just adding more notes on that by reactivating one of haypo's links from bpo-8604. (And: maybe some Linux documentation should be updated?) From Theodore Ts'o, http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don’t-fear-fsync:

As the Eat My Data presentation points out very clearly, the only
safe way according that POSIX allows for requesting data written
to a particular file descriptor be safely stored on stable storage
is via the fsync() call.  Linux’s close(2) man page makes this
point very clearly:

A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It is not
common for a file system to flush the buffers when the stream is
closed. If you need to be sure that the data is physically stored
use fsync(2).

Why don’t application programmers follow these sage words?  These
three reasons are most often given as excuses:

- (Perceived) performance problems with fsync()
- The application only needs atomicity, but not durability
- The fsync() causing the hard drive to spin up unnecessarily in
  laptop_mode

(Don't ask me why i'm adding this note though. I should have searched for it once i've opened that issue? Bah!!! Ts'o did not write that article for me. He'd better hacked.)

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

Calling fsync on a file descriptor referring to a tty doesn't make much sense.
On Linux, this fails with EINVAL:
$ python -c 'import os; os.fsync(1)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OSError: [Errno 22] Invalid argument

So if the full sync fails on ttys, it shouldn't be a problem: since the default performs a classical fsync, this won't break compatibility with existing code anyway. So I think you should stick with the previous version (well, if the full sync fails on other FDs, then it's another story, but in that case it should just be dropped altogether if it's not reliable...).

By the way, it's "appropriate", not "approbiate". You made the same typo in your patch.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

[.]

OSError: [Errno 22] Invalid argument

Sorry, i didn't know that. Mac OS X (2.5 and 2.6 Apple shipped):

21:43 \~/tmp $ python2.5 -c 'import os; os.fsync(1)'; echo $? 0 21:43 \~/tmp $ python2.6 -c 'import os; os.fsync(1)'; echo $? 0 21:43 \~/tmp $ python2.7 -c 'import os; os.fsync(1)'; echo $? 0 21:43 \~/tmp $ python3 -c 'import os; os.fsync(1)'; echo $? 0

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

So I think you should stick with the previous version (well, if the full sync fails on other FDs, then it's another story, but in that case it should just be dropped altogether if it's not reliable...).

Strong stuff. *This* is the version which should have been implemented from the beginning, but Apple states

F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems.

and i thought

fsync (maybe move buffers to Queue; do reorder Queue as approbiate)
do call fsys impl. to .. whatever That's why i had a version of the patch which did 'fsync();fcntl();' because it would have been an additional syscall but the fsync() part would possibly have been essentially "skipped over" ..unless.. Linux RC scripts had 'sync && sync && sync' but it does not seem to be necessary any more (was it ever - i don't know).

But who knows if that fcntl will fail on some non-noted fsys? I think it's better to be on the safe side. Quoting you, Charles-François

People requiring write durability will probably manage to find this full_sync parameter and if they do they thus really strive for data integrity, so call fsync() as a fallback for the security which Apple provides. Also: we cannot let os.fsync() fail with ENOTTY!?

By the way, it's "appropriate", not "approbiate". You made the same typo in your patch.

8\~I That was not a typo. Thanks. I'll changed that.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

and if they do they thus really strive for data integrity, so call fsync() as a fallback for the security which Apple provides.

Why? If I ask a full sync and it fails, I'd rather have an error returned so that I can take the appropriate decision (abort, roll-back, try a standard fsync) rather than have Python silently replace it by an fsync.

Also: we cannot let os.fsync() fail with ENOTTY!?

Why not, since that's what the kernel returns? Once again, since the default behaviour doesn't change, this won't break any existing application.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Finally, depending on the workload, it could have a significant performance impact.

Oh yes (first replaces os.fsync() with an in-python wrapper):

18:12 \~/tmp $ ll mail ls: mail: No such file or directory 18:12 \~/tmp $ ll X-MAIL 312 -rw-r----- 1 steffen staff 315963 15 May 17:49 X-MAIL 18:12 \~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL Dispatched 37 tickets to 4 targets.

real 0m4.638s user 0m0.974s sys 0m0.160s 18:13 \~/tmp $ rm -rf mail 18:13 \~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL Dispatched 37 tickets to 4 targets.

real 0m1.228s user 0m0.976s sys 0m0.122s

(I'm using the first one.)

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

(I'm not sure Rietveld sent the message so I post it here, sorry in case of duplicate).

Steffen, I've made a quick review of your patch, in case you're interested. I think that this functionality can be really useful to some people, and it'd be nice if your patch could stabilize somewhat so that committers can review it properly and eventually merge it.

Concerning your benchmark: I don't know exactly what you're measuring, but when performing I/O-related benchmarks, it's always a good idea to run each test several times in a row, or flush the page/buffer cache before each run: the reason is that the the second run benefits from the page/buffer cache, which often speeds things up dramatically. Example:

# time find /lib -type f -exec cat {} \; > /dev/null real 0m20.455s user 0m8.145s sys 0m5.256s

# time find /lib -type f -exec cat {} \; > /dev/null real 0m6.827s user 0m8.477s sys 0m4.804s

# echo 3 > /proc/sys/vm/drop_caches

# time find /lib -type f -exec cat {} \; > /dev/null real 0m19.954s user 0m8.069s sys 0m5.364s

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Thank you, thank you, thank you. I'm a bit irritated that a french man treats a "wet-her" as a typo! What if *i* like it?? In fact it is a fantastic physical backing store. Unbeatable.

Well and about dropping the fsync() in case the fcntl() fails with ENOTTY. This is "\<Esc>2dd", which shouldn't hurt a committer. I'm convinced that full_fsync=False is optional and false by default, but i don't trust Apple. I've seen a reference to an "atomic file" somewhere on bugs.python.org and that does fsync() first followed by fcntl() if FULLFSYNC is available. Thus, if someone knows about that, she may do so, but otherwise i would guess he doesn't, and in that case i would not expect ENOTTY from an fsync() - still i want a full flush! This is what NetBSD describes:

NOTES For optimal efficiency, the fsync_range() call requires that the file system containing the file referenced by fd support partial synchronization of file data. For file systems which do not support partial synchronization, the entire file will be synchronized and the call will be the equivalent of calling fsync().

But Apple is *soooo*speeeeciaaaal* again. Happy birthday.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

I've dropped wet-her! I hope now you're satisfied!!!!! So the buffer cache is all which remains hot. How deserted!

And you could also add a test (I guess that just calling fsync with full_sync=True on a valid FD would be enough.

I was able to add two tests as an extension to what is yet tested about os.fsync(), but that uses an invalid fd. (At least it enters the conditional and fails as expected.)

I'm not sure static is necessary, I'd rather make it const.

Yes..

This code is correct as it is, see other extension modules in the stdlib for other examples of this pattern

..but i've used copy+paste here.

And you could also add a test (I guess that just calling fsync with full_sync=True on a valid FD would be enough.

The alternative would be that full_sync

Ok, i've renamed full_fsync to full_sync.

pitrou commented 13 years ago

I've dropped wet-her! I hope now you're satisfied!!!!! So the buffer cache is all which remains hot. How deserted!

I'm not sure I'm always understanding your messages well (I'm not a native English speaker), but I don't think this kind of joke is appropriate for the bug tracker. Thank you.

vstinner commented 13 years ago

I don't like the API because it gives a different behaviour depending on the OS and I don't see how to check that the function does really a full sync.

I would prefer a new option os.fullsync() function which is like your os.fsync(fd, full_sync=False), except that the function doesn't exist if the OS doesn't implement it.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Excusing myself seems to be the only "probates Mittel". @Antoine Pitrou: It was a real shame to read your mail. (It's sometimes so loud that "i don't even hear what i write".)

vstinner commented 13 years ago

See also issue bpo-12102.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

diff --git a/Doc/library/os.rst b/Doc/library/os.rst --- a/Doc/library/os.rst +++ b/Doc/library/os.rst @@ -810,6 +810,35 @@ Availability: Unix, and Windows.

+.. function:: fullfsync(fd) + + The POSIX standart requires that :c:func:`fsync` must transfer the buffered + data to the storage device, not that the data is actually written by the + device itself. It explicitely leaves it up to operating system implementors + whether users are given stronger guarantees on data integrity or not. Some + systems also offer special functions which overtake the part of making such + stronger guarantees, i.e., Mac OS X and NetBSD. + + This non-standart function is *optionally made available to access such + special functionality when feasible. It will force write of file buffers to + disk and the flush of disk caches of the file given by file descriptor *fd. + To strive for best-possible data integrity, the following can be done:: + + # Force writeout of local buffer modifications + f.flush() + # Then synchronize the changes to physical backing store + if hasattr(os, 'fullfsync'): + os.fullfsync(f.fileno()) + elif hasattr(os, 'fsync'): + os.fsync(f.fileno()) + + ..note:: + Calling this function may take a long time, since it may block + until the disk reports that the transfer has been completed. + + Availability: Unix. + + .. function:: ftruncate(fd, length)

Truncate the file corresponding to file descriptor \*fd*, so that it is at most

diff --git a/Lib/test/test_os.py b/Lib/test/test_os.py --- a/Lib/test/test_os.py +++ b/Lib/test/test_os.py @@ -835,12 +835,12 @@

 class TestInvalidFD(unittest.TestCase):
     singles = ["fchdir", "dup", "fdopen", "fdatasync", "fstat",
-               "fstatvfs", "fsync", "tcgetpgrp", "ttyname"]
+               "fstatvfs", "fsync", "fullfsync", "tcgetpgrp", "ttyname"]
     #singles.append("close")
-    #We omit close because it doesn'r raise an exception on some platforms
+    # We omit close because it doesn't raise an exception on some platforms
     def get_single(f):
         def helper(self):
-            if  hasattr(os, f):
+            if hasattr(os, f):
                 self.check(getattr(os, f))
         return helper
     for f in singles:
diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -174,6 +174,11 @@
 #endif /* ! __IBMC__ */

 #ifndef _MSC_VER
+  /* os.fullfsync()? */
+# if (defined HAVE_FSYNC && ((defined __APPLE__ && defined F_FULLFSYNC) || \
+                             (defined __NetBSD__ && defined FDISKSYNC)))
+#  define PROVIDE_FULLFSYNC
+# endif

 #if defined(__sgi)&&_COMPILER_VERSION>=700
 /* declare ctermid_r if compiling with MIPSPro 7.x in ANSI C mode
@@ -2129,6 +2134,41 @@
 {
     return posix_fildes(fdobj, fsync);
 }
+
+# ifdef PROVIDE_FULLFSYNC
+PyDoc_STRVAR(fullfsync__doc__,
+"fullfsync(fd)\n\n"
+"force write of file buffers to disk, and the flush of disk caches\n"
+"of the file given by file descriptor fd.");
+
+static PyObject *
+fullfsync(PyObject *self, PyObject *fdobj)
+{
+    /* See issue 11877 discussion */
+    int res, fd = PyObject_AsFileDescriptor(fdobj);
+    if (fd < 0)
+        return NULL;
+    if (!_PyVerify_fd(fd))
+        return posix_error();
+
+    Py_BEGIN_ALLOW_THREADS
+#  if defined __APPLE__
+    /* F_FULLFSYNC is not supported for all types of FDs/FSYSs;
+     * be on the safe side and test for inappropriate ioctl errors */
+    res = fcntl(fd, F_FULLFSYNC);
+    if (res < 0 && errno == ENOTTY)
+        res = fsync(fd);
+#  elif defined __NetBSD__
+    res = fsync_range(fd, FFILESYNC | FDISKSYNC, 0, 0);
+#  endif
+    Py_END_ALLOW_THREADS
+
+    if (res < 0)
+        return posix_error();
+    Py_INCREF(Py_None);
+    return Py_None;
+}
+# endif /* PROVIDE_FULLFSYNC */
 #endif /* HAVE_FSYNC */

 #ifdef HAVE_SYNC
@@ -9473,6 +9513,9 @@
 #endif
 #ifdef HAVE_FSYNC
     {"fsync",       posix_fsync, METH_O, posix_fsync__doc__},
+# ifdef PROVIDE_FULLFSYNC
+    {"fullfsync",   fullfsync, METH_O, fullfsync__doc__},
+# endif
 #endif
 #ifdef HAVE_SYNC
     {"sync",        posix_sync, METH_NOARGS, posix_sync__doc__},

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

(This was an attachment to an empty mail message.)

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

Trying to revive this issue, here's a comment I left on Rietveld:

"""

I don't agree, the documentation states that full_sync will cause a flush to stable storage where supported and a plain fsync where it isn't. This code does just that.

I think I wasn't clear enough, so let me rephrase: the call to fcntl is guarded by #ifdef, so it is only called on platforms where it is supported, otherwise fsync() is called. What I was referring to is the fact that an fsync is tried if the call to fcntl fails with ENOTTY: if I ask my operating system to flush disk buffers and it can't, then I'd rather have an error returned, so that I can take the right decision (abort, roll-back, try a classical fsync). IMHO, silently masking an error by falling back to another syscall is wrong. """

Anyway, I think this patch can be useful (see for example issue bpo-8604 as a typical use case). I personally don't care since I use Linux, but OS-X and *BSD folks might find some interest in this.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Here is something unsorted and loose:

@neologix: One could argue that something had happened before the fsync(2), so that code which blindly did so is too dumb to do any right decision anyway. Even PEP-3151 won't help.
I favour haypos fullsync() approach, because that could probably make it into it. Yet Python doesn't offer any possibility to access NetBSD DISKSYNC stuff sofar.
Currently the programmer must be aware of any platform specific problems. I, for example, am not aware of Windows. How can i give any guarantee to users which (will) use my S-Postman on Windows? I need to put trust in turn into the Framework i am using - Python. And that makes me feel pretty breathless.
Fortunately Python is dynamic, so that one simply can replace os.fsync(). Works once only though (think signal handlers :=).

+ That is indeed the solution i'm using for my S-Postman, because *only* like this i can actually make Python's mailbox.py module reliable on Mac OS X! I can't give any guarantee for NetBSD, though i document it!

+ Aaarrg! I'm a liar!! I lie about - data integrity!!!

--Steffen Ciao, sdaoden(*)(gmail.com) () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

One could argue that something had happened before the fsync(2), so that code which blindly did so is too dumb to do any right decision anyway. Even PEP-3151 won't help.

I don't understand. If the syscall supposed to flush the disk's buffer cache fails - be it fcntl() or sync_file_range() - I think the error should be propagated, not silently ignored and replaced with another syscall which doesn't have the same semantic. That's all.

I favour haypos fullsync() approach, because that could probably make it into it. Yet Python doesn't offer any possibility to access NetBSD DISKSYNC stuff sofar.

Are you willing to update your patch accordingly?

Currently the programmer must be aware of any platform specific problems. I, for example, am not aware of Windows. How can i give any guarantee to users which (will) use my S-Postman on Windows? I need to put trust in turn into the Framework i am using - Python. And that makes me feel pretty breathless.

I agree.

Fortunately Python is dynamic, so that one simply can replace os.fsync(). Works once only though (think signal handlers :=).

Yes, but since it's a fairly reasonable and useful feature, I think it could be merged.

Aaarrg! I'm a liar!! I lie about - data integrity!!!

Well, actually, some hard disks lie about this too :-)

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

> Even PEP-3151 won't help.

I don't understand. If the syscall supposed to flush the disk's buffer cache fails - be it fcntl() or sync_file_range() - I think the error should be propagated, not silently ignored and replaced with another syscall which doesn't have the same semantic. That's all.

I'm with you theoretically - of course errors should be propagated to users so that they can act accordingly. And this is exactly what the patches do, *unless* it is an error which is not produced by the native fsync(2) call:

-- >8 --
?0%0[steffen@sherwood tmp]$ cat t.c 
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(void) {
    int r  = fcntl(2, F_FULLFSYNC);
    fprintf(stderr, "1. %d: %d, %s\n", r, errno, strerror(errno));
    errno = 0;
    r = fsync(2);
    fprintf(stderr, "2. %d: %d, %s\n", r, errno, strerror(errno));
    return 0;
}
?0%0[steffen@sherwood tmp]$ gcc -o t t.c && ./t
1. -1: 25, Inappropriate ioctl for device
2. 0: 0, Unknown error: 0
?0%0[steffen@sherwood tmp]$ grep -F 25 /usr/include/sys/errno.h 
#define ENOTTY      25      /* Inappropriate ioctl for device */
-- >8 --

So in fact the patches do what is necessary to make the changed version act just as the plain systemcall.

> - I favour haypos fullsync() approach Are you willing to update your patch accordingly?

Both patches still apply onto the tip of friday noon: http://bugs.python.org/file22016/11877.9.diff, http://bugs.python.org/file22046/11877-standalone.1.diff.

Again: i personally would favour os.fsync(fd, fullsync=True), because that is the only way to put reliability onto unaware facilities unaware (e.g. my S-Postman replaces os.fsync() with a special function so that reliability is injected in- and onto Python's mailbox.py, which calls plain os.fsync()), but i've been convinced that this is impossible to do. It seems to be impossible to change os.fsync() at all, because it has a standartized function prototype.

So what do you mean? Shall i rewrite 11877-standalone.1.diff to always offer fullsync() whenever there is fsync()? This sounds to be a useful change, because testing hasattr() of the one would imply availability of the other.

> + Aaarrg! I'm a liar!! I lie about - data integrity!!! Well, actually, some hard disks lie about this too :-)

Yeah. But hey: "I feel save in New York City. I feel save in New York City." Nice weekend - and may the juice be with you!

--Steffen Ciao, sdaoden(*)(gmail.com) ASCII ribbon campaign ( ) More nuclear fission plants against HTML e-mail X can serve more coloured and proprietary attachments / \ and sounding animations

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Are you willing to update your patch accordingly?

I'm a vain rooster! I've used fullfsync() in 11877-standalone.1.diff! Sorry for that, haypo. :-/

11877.fullsync-1.diff uses fullsync() and will instead always be provided when fsync() is available, to which it will fall back if no special operating system functionality is available.

I really think this is the cleanest solution, because like this a user can state "i want the strongest guarantees available on data integrity", and Python does just that.

--Steffen Ciao, sdaoden(*)(gmail.com) ASCII ribbon campaign ( ) More nuclear fission plants against HTML e-mail X can serve more coloured and proprietary attachments / \ and sounding animations

python / cpython

Change os.fsync() to support physical backing store syncs #56086

ifdef APPLE

endif

ifdef NetBSD

endif

if defined(APPLE)

elif defined(NetBSD)

endif