Closed sebastic closed 10 years ago
Cause could be that the binaries zcat
and bzcat
are not available or not in the path.
The build problems in earlier versions have been "fixed" by not building that part on architectures other than linux, because BSD systems don't have the mremap() syscall.
Thanks for the quick feedback!
Cause could be that the binaries zcat and bzcat are not available or not in the path.
Both zcat and bzcat are available in /bin which is in $PATH, so that's not the problem.
A trace of run_tests.sh t/osmfile/test_read_and_write.cpp
using ktrace & kdump, shows that the utilities are found and used correctly (they exit with status 0). The ktrace file is 41 MB and its human readable dump is 78 MB, so a little too big to attach to this issue. I can put them online if you want to inspect them.
I ran the failing test case manually a couple of times, and interestingly it doesn't fail consistently. I'm starting to suspect there may be a race between reading and writing the files which fail when the filesystem hasn't completed writing the data before it's being read for verification.
$ ./run_tests.sh t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "read_from_xml_gz_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint
unknown location(0): fatal error in "read_from_xml_bz2_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint
*** 4 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
$ ./run_tests.sh t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
*** 2 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
$ ./run_tests.sh t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "read_from_xml_gz_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint
*** 3 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
I have just pushed a change which might or might not fix the problem. I was not closing the file descriptor when it was 0. Usually the fd should not be 0, because thats stdin, but who knows.
If this does not fix the problem maybe you can output the status
in include/osmium/osmfile.hpp
line 547 and see whether that tells us anything.
The file descriptor change (daaf48bee9d8bffc9290891e116e17c0defb2d6b) doesn't fix the issue.
I've added the status using this change:
diff --git a/include/osmium/osmfile.hpp b/include/osmium/osmfile.hpp
index fb7f7fb..65a8098 100644
--- a/include/osmium/osmfile.hpp
+++ b/include/osmium/osmfile.hpp
@@ -534,7 +534,7 @@ namespace Osmium {
}
void close() {
- if (m_fd > 0) {
+ if (m_fd >= 0) {
::close(m_fd);
m_fd = -1;
}
@@ -545,7 +545,9 @@ namespace Osmium {
if (pid < 0 || !WIFEXITED(status) || WEXITSTATUS(status) != 0) {
// global variable errno doesn't contain a valid value,
// so we better set errno=0
- throw IOError("Subprocess returned error", m_filename, 0);
+ char error[256];
+ sprintf(error, "Subprocess returned error (child_pid: %i, pid: %i, status: %i, WIFEXITED: %i, WEXITSTATUS: %i)", m_childpid, pid, status, WIFEXITED(status), WEXITSTATUS(status));
+ throw IOError(error, m_filename, 0);
}
m_childpid = 0;
}
This value is always 0 for the failing test cases:
$ ./run_tests.sh -o t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error (child_pid: 8358, pid: -1, status: 0, WIFEXITED: 1, WEXITSTATUS: 0)
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error (child_pid: 8359, pid: -1, status: 0, WIFEXITED: 1, WEXITSTATUS: 0)
t/osmfile/test_read_and_write.cpp(47): last checkpoint
*** 2 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
waitpid returns -1 "If there are no children not previously awaited, with errno set to ECHILD" or "If an error is detected or a caught signal aborts the call, with errno is set to indicate the error." The latter seems to be the case based on the kdump:
7254 gzip CALL exit(0)
7253 tests RET wait4 -1 errno 4 Interrupted system call
7253 tests PSIG SIGCHLD caught handler=0x8023c17b0 mask=0x0 code=CLD_EXITED
7253 tests CALL sigreturn(0x688480)
7253 tests RET sigreturn JUSTRETURN
Maybe SIGCHLD shouldn't be ignored here?
https://github.com/joto/osmium/blob/master/test/t/osmfile/test_read_and_write.cpp#L29
You are right, it must have something to do with the SIGCHLD handling. Unfortunately if I add the DISABLE_SIGCHLD() call to the tests, they now fail on Linux. Reading some of the man pages, SIGCHLD seems to be a complex issue...
I may have found a workaround:
BOOST_TEST_CATCH_SYSTEM_ERRORS=no ./run_tests.sh -o t/osmfile/test_read_and_write.cpp
The BOOST_TEST_CATCH_SYSTEM_ERRORS variable was mentioned in Boost Bug #3664
The wait system call now returns the child PID as expected:
9010 gzip CALL exit(0)
9007 tests CALL close(0x5)
9007 tests RET close 0
9007 tests CALL wait4(0x2332,0x7fffffffc58c,0,0)
9007 tests RET wait4 9010/0x2332
I have added BOOST_TEST_CATCH_SYSTEM_ERRORS=no
to run_test.sh
. It doesn't seem to have bad effect on Linux, so I added it unconditionally.
3 osmfile tests fail on Debian kFreeBSD (build log):
The previous git snapshot (0.0~20111213-g7f3500a) didn't even build (build log 1, build log 2):
But this problem has been resolved in the current git snapshot (0.0~20140825-3f3d229).
Please advise how to best help troubleshoot this issue.