Closed 5df057ac-c83d-447e-8c45-910556b17608 closed 2 years ago
Hi!
test_touch_common fails when using 8.3-STABLE FreeBSD 8.3-STABLE and Python 3.4.0b1 (default:a0ec33efa743+, Nov 30 2013, 10:36:58).
Here are the tracebacks:
\====================================================================== FAIL: test_touch_common (test.test_pathlib.PathTest) ----------------------------------------------------------------------
Traceback (most recent call last):
File "/tank/libs/cpython/Lib/test/test_pathlib.py", line 1402, in test_touch_common
self.assertGreaterEqual(st.st_mtime_ns, old_mtime_ns)
AssertionError: 1385800632000000000 not greater than or equal to 1385800632871814968
====================================================================== FAIL: test_touch_common (test.test_pathlib.PosixPathTest) ----------------------------------------------------------------------
Traceback (most recent call last):
File "/tank/libs/cpython/Lib/test/test_pathlib.py", line 1402, in test_touch_common
self.assertGreaterEqual(st.st_mtime_ns, old_mtime_ns)
AssertionError: 1385800633000000000 not greater than or equal to 1385800633042814928
Ran 319 tests in 0.368s
FAILED (failures=2, skipped=85) test test_pathlib failed 1 test failed: test_pathlib
This issue seems to be related with bpo-15745.
I don't really know what to do with this. I think you'll have to investigate a bit and find out exactly what happens during the test.
As a data point, if not for ZFS, test_pathlib passes on FreeBSD 6.4: http://buildbot.python.org/all/builders/x86%20FreeBSD%206.4%203.x/builds/4261/steps/test/logs/stdio and FreeBSD 7.2: http://buildbot.python.org/all/builders/x86%20FreeBSD%207.2%203.x/builds/4731/steps/test/logs/stdio
Is this similar/related to bpo-15745?
I took both of my buildbots (koobs-freebsd9, koobs-freebsd10) off ZFS until it could be resolved
Sorry Claudiu I missed the issue reference in your comment
I believe it's similar, both test_os and test_pathlib fails when executed from within a ZFS container. I checked, I did a fresh checkout of Python inside a normal directory and run the tests there, they ran without problems.
Since bpo-15745 hasn't been fixed yet, would be okay to skip these tests when the test suite runs from a ZFS container? Currently, these failures are a nuissance when running the test suite.
I'd like to put the buildbot slave instances back onto ZFS for broader Disk/IO test coverage for Python and other projects as well as to gain some administrative disk utilisation flexibility.
These two issues have unfortunately precluded that, and there's much more value to projects with a reliably green buildbot than one that is red, ending up ignored, and hiding other issues or regressions in the meantime.
+1 on disabling these tests and leaving the issues open so someone can pick them up at a later date and have a ZFS environment in which to reproduce/resolve them in a custom builder
Here's a patch that marks those tests as expected failures if the source checkout is inside a zfs container. It uses df -t zfs
, it was the easiest way I could find to detect that we are running from a zfs container.
This test also fails on Linux when using ZFS.
Sorry to have such an awful configuration, but this approach won't fix the problem for me.
I use a Linux "encrypted home directory", which uses crazy loopback mount logic to create an on-the-fly decrypted representation of my home directory. So my home directory is actually on an "ecryptfs" device:
% df -T
Filesystem Type Mounted on
...
home zfs /home
/home/larry/.Private ecryptfs /home/larry
Maybe we could ignore deltas below a certain race-condition threshold? Perhaps a millisecond? I think there are already tests like that.
(By "this approach" I meant "the approach employed in the first patch posted". Sorry for the ambiguity.)
So, I'm not against the patch, but it would be nice to diagnose where exactly the issue comes from.
@Antoine: We have investigated a bit here and it seems that ZFS just keeps a timestamp in seconds in the memory.
See function uberblock_update in uberblock.c ( http://people.freebsd.org/~gibbs/zfs_doxygenation/html/d3/d65/uberblock_8c_source.html - Row 57) ub->ub_timestamp = gethrestime_sec();
After thinking and looking a bit on it, this patch isn't the best to put into the code what I can see.
If any of these things really break, because of something completely different - we won't notice it.
A more accurate solution is to maybe look on a different way to compare the timestamps that works on systems that just support seconds too?
I agree. The patch was merely a temporary solution to the real problem.
bpo-15745 is probably related to this issue
So me and @larstiq have been sitting with this during the sprint here and we have figured out some things but haven't had the time to put together a patch.
If you put two p.touch() before starting the testing, it will not fail on BSD.
What we have found out so far is that during file creation the resolution of the timestamp is higher then at the touch attempt when a file exists.
The main issue as we see it is that os.stat can return a timestamp with more granularity than os.utime can set.
os.stat can return something more precise than nanosecond resolution?
@Larry
The issue is the following:
In the test after the first p.touch (Pdb) old_mtime 1407128672.4133856 (Pdb) old_mtime_ns 1407128672413385711
After second p.touch (Pdb) st.st_mtime_ns 1407128689000000000 (Pdb) st.st_mtime 1407128689.0
So the issue is that when utime is used it can't set the time with nanosecond, but just with second resolution.
@Francis - A quick fix that would work, but it is a bit dirty but it will work correctly is to just add two p.touch() before doing anything.
Not really the best solution but could work.
So why can't you use the "ns" parameter for os.utime()?
@Larry - the problem arrives after the second touch, which does os.utime(FILE, None) if the file exists.
The os.utime that is in the test-code doesn't really break anything, what breaks the test is the second touch.
A cursory read of the source code suggests everybody is doing the right thing. Can you run this with a debug build of CPython, put a breakpoint in posix_utime, and check that when it's called for the second p.touch that it's correctly calling the underlying function to say "set the time of this file to the current time"?
@Larry - we did that yesterday and on the FreeBSD-machine the regular utimes was used and not utimensat. utimensat is not available on FreeBSD 10 so it fall backs to regular utimes to be used in os.utime. But when a file is created it is the file system that sets the utime, so it will have nanoseconds.
And was the "time" argument passed in to utimes() set to NULL?
I'm no expert at all at GDB, but I will give it a go and if this is wrong I can try a lot more.
Breakpoint 1, posix_utime (self=0xf00918ed8, args=0xf014459f8, kwargs=0x0) at ./Modules/posixmodule.c:4838 4838 PyObject *times = NULL; Current language: auto; currently minimal 4839 PyObject *ns = NULL; 4840 int dir_fd = DEFAULT_DIR_FD; 4841 int follow_symlinks = 1; 4843 "follow_symlinks", NULL}; 4856 memset(&path, 0, sizeof(path)); 4857 path.function_name = "utime"; 4858 memset(&utime, 0, sizeof(utime_t)); 4860 path.allow_fd = 1; 4862 if (!PyArg_ParseTupleAndKeywords(args, kwargs, 4875 if (times && (times != Py_None) && ns) { 4903 else if (ns) { 4919 utime.now = 1; 4927 if (path_and_dir_fd_invalid("utime", &path, dir_fd) || 963 if (!path->narrow && !path->wide && (dir_fd != DEFAULT_DIR_FD)) { 4928 dir_fd_and_fd_invalid("utime", dir_fd, path.fd) || 974 if ((dir_fd != DEFAULT_DIR_FD) && (fd != -1)) { 4929 fd_and_follow_symlinks_invalid("utime", path.fd, follow_symlinks)) 986 if ((fd > 0) && (!follow_symlinks)) { 4933 if ((dir_fd != DEFAULT_DIR_FD) && (!follow_symlinks)) { 4974 Py_BEGIN_ALLOW_THREADS 4977 if ((!follow_symlinks) && (dir_fd == DEFAULT_DIR_FD)) 4989 if (path.fd != -1) 4994 result = utime_default(&utime, path.narrow); 4800 UTIME_TO_TIMEVAL; 4801 return utimes(path, time); (gdb) print time $1 = {\<text variable, no debug info>} 0xf01191d20 \<time> (gdb) print path $2 = {function_name = 0x58bb79 "utime", argument_name = 0x0, nullable = 0, allow_fd = 1, wide = 0x0, narrow = 0xf0099c0d0 "/tmp/test", fd = -1, length = 9, object = 0xf01445f90, cleanup = 0xf0099c0a0} 4996 Py_END_ALLOW_THREADS 4998 if (result \< 0) { 5006 Py_INCREF(Py_None); 787 if (path->cleanup) { 788 Py_CLEAR(path->cleanup); 5016 }
The "no debug info" makes me think you're using a release build. Please recompile making sure that "-O0" and "-g" are both set. By default the makefile has "-O3" and "-g", so normally you just change the "-O3" to "-O0", then "make clean" and "make".
I used configure with --with-pydebug, will look closer on it when I get to the office
To confirm, my OPT-line is OPT= -g -O0 -Wall -Wstrict-prototypes
Yup, you've got a debug build there. I'm baffled by that gdb "no debug info" line then.
At line 4801 could you also print *utime print tv
Larry: Result of that: (gdb) print *utime Structure has no component named operator*. (gdb) print tv No symbol "tv" in current context.
You can also catch me on IRC with the same nickname as here
I talked to puppet on IRC for a while and we figured out the following about his OS:
and it consistently prints "before \<= after False".
*facepalm*
Since utimes supports microsecond resolution, we could in theory work around this problem by explicitly specifying the current time when using utimes(). If we did that, we might want to also see if this behavior affects futimes() and lutimes().
Functions in posixmodule are expected to be atomic, and this would mean two system calls instead of one--so maybe we should only use this workaround on platforms with the bug?
And to further confirm, on my Linux machine with a ZFS partition:
It still fails, but here the race is much closer:
before: 1407191476502407010
after: 1407191476502090829
before <= after False
If I run the test on a different filesystem (e.g. on /tmp which is ext4) the problem doesn't occur.
My guess: it's ZFS's fault in both cases, it's not using the highest-precision timer available.
Attached is a simple C test case for the same behavior. On my Linux machine this prints "This is insane!" on a ZFS partition and "Everything is okay" on an ext4 partition.
The test case won't work unmodified on FreeBSD. Try commenting out the call to "utimensat", and uncommenting the call to "utimes". I don't guarantee that's sufficient.
If nothing else, we can be certain it's not a Python bug.
Result:
Before: 1407219004 988716892 After: 1407219004 0 Before \<= After 0 This is insane!
bpo-15745 has been fixed in the meantime (about 6 years ago).
If nobody objects I will close this as out of date. If there is still a problem with the test is will be reported for a current python version.
I can confirm that the behavior is fixed in ZFS on Linux. My test case C program now prints "Everything is okay." when run on a ZFS partition on Linux, and test_touch_common from the current tree passes every time. ZFS fixing this was the best possible outcome.
I'll go ahead and just close it now--why wait! If somebody confirms that the test still fails on FreeBSD, please open a new issue.
Thanks for checking in, Irit!
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-bug', 'tests']
title = 'test.test_pathlib.PosixPathTest.test_touch_common fails on FreeBSD with ZFS'
updated_at =
user = 'https://github.com/PCManticore'
```
bugs.python.org fields:
```python
activity =
actor = 'larry'
assignee = 'none'
closed = True
closed_date =
closer = 'larry'
components = ['Tests']
creation =
creator = 'Claudiu.Popa'
dependencies = []
files = ['35553', '36262']
hgrepos = []
issue_num = 19838
keywords = ['patch']
message_count = 38.0
messages = ['204786', '204794', '204795', '204813', '204814', '204815', '220143', '220144', '220167', '224056', '224063', '224064', '224066', '224625', '224627', '224628', '224640', '224642', '224656', '224664', '224679', '224683', '224684', '224685', '224686', '224687', '224698', '224703', '224705', '224706', '224750', '224751', '224770', '224779', '224789', '224810', '408377', '408426']
nosy_count = 10.0
nosy_names = ['jcea', 'pitrou', 'larry', 'ezio.melotti', 'Claudiu.Popa', 'francismb', 'koobs', 'puppet', 'larstiq', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue19838'
versions = ['Python 3.4']
```