joto / osmium

C++/Javascript framework for working with OSM files.
http://wiki.openstreetmap.org/wiki/Osmium
GNU General Public License v3.0
123 stars 31 forks source link

Test failures on Debian kFreeBSD #94

Closed sebastic closed 10 years ago

sebastic commented 10 years ago

3 osmfile tests fail on Debian kFreeBSD (build log):

Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "read_from_xml_gz_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint

*** 3 failures detected in test suite "Main"
==========================

The previous git snapshot (0.0~20111213-g7f3500a) didn't even build (build log 1, build log 2):

In file included from osmjs.cpp:7:0:
../include/osmium/storage/byid.hpp: In member function 'void Osmium::Storage::Mmap<TValue>::set(uint64_t, TValue)':
../include/osmium/storage/byid.hpp:323:109: error: 'MREMAP_MAYMOVE' was not declared in this scope
                     m_items = (TValue*) mremap(m_items, sizeof(TValue) * m_size, sizeof(TValue) * new_size, MREMAP_MAYMOVE);
                                                                                                             ^

But this problem has been resolved in the current git snapshot (0.0~20140825-3f3d229).

Please advise how to best help troubleshoot this issue.

joto commented 10 years ago

Cause could be that the binaries zcat and bzcat are not available or not in the path.

The build problems in earlier versions have been "fixed" by not building that part on architectures other than linux, because BSD systems don't have the mremap() syscall.

sebastic commented 10 years ago

Thanks for the quick feedback!

Cause could be that the binaries zcat and bzcat are not available or not in the path.

Both zcat and bzcat are available in /bin which is in $PATH, so that's not the problem.

A trace of run_tests.sh t/osmfile/test_read_and_write.cpp using ktrace & kdump, shows that the utilities are found and used correctly (they exit with status 0). The ktrace file is 41 MB and its human readable dump is 78 MB, so a little too big to attach to this issue. I can put them online if you want to inspect them.

I ran the failing test case manually a couple of times, and interestingly it doesn't fail consistently. I'm starting to suspect there may be a race between reading and writing the files which fail when the filesystem hasn't completed writing the data before it's being read for verification.

$ ./run_tests.sh t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "read_from_xml_gz_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint
unknown location(0): fatal error in "read_from_xml_bz2_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint

*** 4 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
$ ./run_tests.sh t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint

*** 2 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
$ ./run_tests.sh t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "read_from_xml_gz_file": std::runtime_error: Subprocess returned error
t/osmfile/test_read_and_write.cpp(122): last checkpoint

*** 3 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail
joto commented 10 years ago

I have just pushed a change which might or might not fix the problem. I was not closing the file descriptor when it was 0. Usually the fd should not be 0, because thats stdin, but who knows.

If this does not fix the problem maybe you can output the status in include/osmium/osmfile.hpp line 547 and see whether that tells us anything.

sebastic commented 10 years ago

The file descriptor change (daaf48bee9d8bffc9290891e116e17c0defb2d6b) doesn't fix the issue.

I've added the status using this change:

diff --git a/include/osmium/osmfile.hpp b/include/osmium/osmfile.hpp
index fb7f7fb..65a8098 100644
--- a/include/osmium/osmfile.hpp
+++ b/include/osmium/osmfile.hpp
@@ -534,7 +534,7 @@ namespace Osmium {
         }

         void close() {
-            if (m_fd > 0) {
+            if (m_fd >= 0) {
                 ::close(m_fd);
                 m_fd = -1;
             }
@@ -545,7 +545,9 @@ namespace Osmium {
                 if (pid < 0 || !WIFEXITED(status) || WEXITSTATUS(status) != 0) {
                     // global variable errno doesn't contain a valid value,
                     // so we better set errno=0
-                    throw IOError("Subprocess returned error", m_filename, 0);
+                    char error[256];
+                    sprintf(error, "Subprocess returned error (child_pid: %i, pid: %i, status: %i, WIFEXITED: %i, WEXITSTATUS: %i)", m_childpid, pid, status, WIFEXITED(status), WEXITSTATUS(status));
+                    throw IOError(error, m_filename, 0);
                 }
                 m_childpid = 0;
             }

This value is always 0 for the failing test cases:

$ ./run_tests.sh -o t/osmfile/test_read_and_write.cpp
Checking t/osmfile/test_read_and_write.cpp...[TEST FAILED]
==========================
Running 9 test cases...
unknown location(0): fatal error in "write_to_xml_gz_output_file": std::runtime_error: Subprocess returned error (child_pid: 8358, pid: -1, status: 0, WIFEXITED: 1, WEXITSTATUS: 0)
t/osmfile/test_read_and_write.cpp(47): last checkpoint
unknown location(0): fatal error in "write_to_xml_bz2_output_file": std::runtime_error: Subprocess returned error (child_pid: 8359, pid: -1, status: 0, WIFEXITED: 1, WEXITSTATUS: 0)
t/osmfile/test_read_and_write.cpp(47): last checkpoint

*** 2 failures detected in test suite "Main"
==========================
some tests failed
0 ok, 0 compile error, 1 fail

waitpid returns -1 "If there are no children not previously awaited, with errno set to ECHILD" or "If an error is detected or a caught signal aborts the call, with errno is set to indicate the error." The latter seems to be the case based on the kdump:

  7254 gzip     CALL  exit(0)
  7253 tests    RET   wait4 -1 errno 4 Interrupted system call
  7253 tests    PSIG  SIGCHLD caught handler=0x8023c17b0 mask=0x0 code=CLD_EXITED
  7253 tests    CALL  sigreturn(0x688480)
  7253 tests    RET   sigreturn JUSTRETURN

Maybe SIGCHLD shouldn't be ignored here?

https://github.com/joto/osmium/blob/master/test/t/osmfile/test_read_and_write.cpp#L29

joto commented 10 years ago

You are right, it must have something to do with the SIGCHLD handling. Unfortunately if I add the DISABLE_SIGCHLD() call to the tests, they now fail on Linux. Reading some of the man pages, SIGCHLD seems to be a complex issue...

sebastic commented 10 years ago

I may have found a workaround:

BOOST_TEST_CATCH_SYSTEM_ERRORS=no ./run_tests.sh -o t/osmfile/test_read_and_write.cpp

The BOOST_TEST_CATCH_SYSTEM_ERRORS variable was mentioned in Boost Bug #3664

The wait system call now returns the child PID as expected:

  9010 gzip     CALL  exit(0)
  9007 tests    CALL  close(0x5)
  9007 tests    RET   close 0
  9007 tests    CALL  wait4(0x2332,0x7fffffffc58c,0,0)
  9007 tests    RET   wait4 9010/0x2332
joto commented 10 years ago

I have added BOOST_TEST_CATCH_SYSTEM_ERRORS=no to run_test.sh. It doesn't seem to have bad effect on Linux, so I added it unconditionally.