tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
338 stars 97 forks source link

test failures on big-endian platforms with 1.5.17 #123

Closed svillemot closed 5 years ago

svillemot commented 5 years ago

MatIO 1.5.17 fails the testsuite on big-endian architectures. See the following Debian build logs:

Note that 1.5.16 passed the testsuite on those architectures.

tbeu commented 5 years ago

Thanks for reporting. As usual: What's not tested, does not work. Unfortunately I have no longer a BE machine available at CI. Could you please send me the testsuite dir from one of the failing machines?

svillemot commented 5 years ago

Le mercredi 14 août 2019 à 07:21 -0700, tbeu a écrit :

Could you please send me the testsuite dir from one of the failing machines?

Please find it attached. It has been created on an s390x (IBM Z) machine.

-- ⢀⣴⠾⠻⢶⣦⠀ Sébastien Villemot ⣾⠁⢠⠒⠀⣿⡁ Debian Developer ⢿⡄⠘⠷⠚⠋⠀ http://sebastien.villemot.name ⠈⠳⣄⠀⠀⠀⠀ http://www.debian.org

tbeu commented 5 years ago

Does not work by mail. Please attach here in the issue.

tbeu commented 5 years ago

Side note: 8784368a5fea3515925f33c150e5d1af08d55b67 should fix the -Wformat warnings reported on MIPS.

svillemot commented 5 years ago

I had also sent it to you by private mail, but here is a copy in the issue.

testsuite-issue123.tar.gz

tbeu commented 5 years ago

Thanks. Need to analyze in more detail since I did not figure it out when only looking at the changes between 1.5.17 and 1.5.16.

tbeu commented 5 years ago

I had also sent it to you by private mail

Got it now. Took some 20 min.

tbeu commented 5 years ago

I am really puzzled. Please confirm that the build succeeded with 1.5.16!

svillemot commented 5 years ago

1.5.16 succeeded on s390x, as you can see from: https://buildd.debian.org/status/fetch.php?pkg=libmatio&arch=s390x&ver=1.5.16-1&stamp=1563564814&raw=0

However it did not have the time to be tried on MIPS, probably because of an overload of the Debian build machines.

svillemot commented 5 years ago

I also just verified by myself on a s390x machine that 1.5.16 succeeds and 1.5.17 fails, in the same compilation environment.

tbeu commented 5 years ago

Are you able to execute some command on the s390x machine with 1.5.16 installed?

test_mat -v 5 write_struct_2d_numeric

I'd need the created test_write_struct_2d_numeric.mat for further analysis.

svillemot commented 5 years ago

There you go. test_write_struct_2d_numeric.mat.gz

tbeu commented 5 years ago

Thanks. Files do not differ between 1.5.16 and 1.5.17. Still, it is a long standing name writing bug since MATLAB cannot read that file. It was only revealed as side-effect of updating the reading by 02e9fcd09c664236e9ae261df7d64937216b0ab9.

Could you please also run the following three commands and send me the three created BE MAT files. Does not matter if 1.5.16 or 1.5.17.

test_mat -v 5 write_empty_cell
test_mat -v 5 write_cell_2d_logical
test_mat -v 5 write_cell_empty_struct
svillemot commented 5 years ago

Here are the 3 additional files, produced with 1.5.16. 3-tests.tar.gz

tbeu commented 5 years ago

Thanks again. As I thought, these MAT files cannot be successfully read. Can you please apply 2732a907bcbf9655199f53a65a22060e803b38c3 (on 1.5.16, 1.5.17 or ideally: current master), run the test-suite on a BE machine and provide the four MAT files created by test_mat as above. If you approve I am going to release a fixed version soon (though obviously nobody uses libmatio on BE machines).

svillemot commented 5 years ago

I applied 2732a90 on top of 1.5.17 and this indeed fixes the testsuite. I attach the 4 MAT-files. fixed-mats.tar.gz

(I also tried building from git, but for some reason all the tests fails, and this is not related to your recent changes; I must be missing something)

tbeu commented 5 years ago

Thanks for confirmation. All four BE MAT files can be read by MATLAB now.

Would be good to know what the master issue is before releasing 1.5.18. Please let me know.

svillemot commented 5 years ago

First note that I did not test the master branch, but the big-endian one.

Essentially all tests fail with the same message, e.g.:

./mat4_read_le.at:29: exit code was 1, expected 0
1. mat4_read_le.at:27: 1. Read 2d double array (mat4_read_le.at:27): FAILED (mat4_read_le.at:29)

I’m pretty sure I already encountered this in the past, and that I had solved it, but I can't remember how, and I don't really have the time to investigate now.

tbeu commented 5 years ago

Does anybody know about an affordable big-endian single-board controller, something like Arduino or RasPi?

tbeu commented 5 years ago

Essentially all tests fail with the same message,

Hm, have a look in testsuite.dir/1/testsuite.log and it might give a clue what needs to be configured.

svillemot commented 5 years ago

Here is the full content of testsuite.dir/0001/testsuite.log. I see nothing useful:

#                             -*- compilation -*-
1. mat4_read_le.at:27: testing Read 2d double array ...
./mat4_read_le.at:29: cp $srcdir/results/read-var1.out expout
         $builddir/test_mat readvar $srcdir/datasets/matio_test_cases_v4_le.mat var1
--- expout      2019-08-15 09:14:47.031554709 +0000
+++ /home/sebastien/matio/test/testsuite.dir/at-groups/1/stdout 2019-08-15 09:14:47.027554709 +0000
@@ -1,11 +0,0 @@
-      Name: var1
-      Rank: 2
-Dimensions: 4 x 5
-Class Type: Double Precision Array
- Data Type: IEEE 754 double-precision
-{
-1 5 9 13 17 
-2 6 10 14 18 
-3 7 11 15 19 
-4 8 12 16 20 
-}
./mat4_read_le.at:29: exit code was 1, expected 0
1. mat4_read_le.at:27: 1. Read 2d double array (mat4_read_le.at:27): FAILED (mat4_read_le.at:29)

Again, I’m pretty sure that the problem comes from the fact that I’m compiling from git outside of my usual Debian packaging workflow, and so I don't think it's a blocker for 1.5.18.

tbeu commented 5 years ago

I just indicates that nothing was printed to stdout when running test_mat readvar $srcdir/datasets/matio_test_cases_v4_le.mat var1

svillemot commented 5 years ago

I’ve found why the testsuite was failing in my git local copy: it’s because I had not initialized the datasets submodule. Maybe the error message could be made more explicit in that case.

So I confirm that the testsuite passes on a big endian arch (s390x) using the tip of the big-endian branch. The latter can therefore be merged in master and this issue be closed.

tbeu commented 5 years ago

I’ve found why the testsuite was failing in my git local copy: it’s because I had not initialized the datasets submodule. Maybe the error message could be made more explicit in that case.

Let me see what I can do.

So I confirm that the testsuite passes on a big endian arch (s390x) using the tip of the big-endian branch. The latter can therefore be merged in master and this issue be closed.

Done. Thanks.

tbeu commented 4 years ago

IBM Z is now available through Travis CI, see b3f656204088fc1018b2a451a536e2aabe1f3e1c