consistently failing test 9 on 32 bit platform

yarikoptic commented 13 years ago

I will email entire log in the email, but here is the relevant excerpt:

RunTests: test 9: stdout OK
12,20c12,20
< -329633465.333333 -329633465.333333        3      3.0    unknown   0.0000       38
< -595057498.000000 -860481530.666667        6      6.0    unknown   0.0000       14
< -613376496.363636 -635359294.400000       11     11.0    unknown   0.0000       32
< -1159652955.090909 -1705929413.818182       22     22.0    unknown   0.0000        2
< -1017516755.340909 -875380555.590909       44     44.0    unknown   0.0000      166
< -1043034866.712644 -1069146422.534884       87     87.0    unknown   0.0000       29
< -879457718.436782 -715880570.160920      174    174.0    unknown   0.0000       17
< -1082324919.413793 -1285192120.390805      348    348.0    unknown   0.0000        2
< -1302478374.419540 -1522631829.425287      696    696.0    unknown   0.0000      143

---
> 10.296875  10.296875         3      3.0    unknown   0.0000       38
> 10.437155  10.577434         6      6.0    unknown   0.0000       14
> 10.347227  10.239314        11     11.0    unknown   0.0000       32
> 10.498632  10.650038        22     22.0    unknown   0.0000        2
> 10.495566  10.492500        44     44.0    unknown   0.0000      166
> 10.469184  10.442189        87     87.0    unknown   0.0000       29
> 10.068007  9.666830        174    174.0    unknown   0.0000       17
> 9.477440   8.886873        348    348.0    unknown   0.0000        2
> 9.020482   8.563524        696    696.0    unknown   0.0000      143
26c26
< average loss = -1.275e+09

---
> average loss = 8.804
RunTests: test 9: FAILED: stderr(stderr.tmp) != ref(train-sets/ref/wiki1K.stderr):

seems to happen only when building on 32bit, ok 64bit looks fine:

RunTests: test 9: stdout OK
18c18
< 10.068007  9.666831        174    174.0    unknown   0.0000       17

---
> 10.068007  9.666830        174    174.0    unknown   0.0000       17
RunTests: test 9: minor (<0.0001) precision differences ignored
RunTests: test 9: stderr OK

JohnLangford commented 13 years ago

Matt, it looks like there is some significant 32bit vs. 64bit weirdness in the LDA code. Do you know what it is?

-John

On 06/03/2011 05:06 PM, yarikoptic wrote:

I will email entire log in the email, but here is the relevant excerpt:

RunTests: test 9: stdout OK
12,20c12,20
<  -329633465.333333 -329633465.333333        3      3.0    unknown   0.0000       38
<  -595057498.000000 -860481530.666667        6      6.0    unknown   0.0000       14
<  -613376496.363636 -635359294.400000       11     11.0    unknown   0.0000       32
<  -1159652955.090909 -1705929413.818182       22     22.0    unknown   0.0000        2
<  -1017516755.340909 -875380555.590909       44     44.0    unknown   0.0000      166
<  -1043034866.712644 -1069146422.534884       87     87.0    unknown   0.0000       29
<  -879457718.436782 -715880570.160920      174    174.0    unknown   0.0000       17
<  -1082324919.413793 -1285192120.390805      348    348.0    unknown   0.0000        2
<  -1302478374.419540 -1522631829.425287      696    696.0    unknown   0.0000      143
---
> 10.296875  10.296875         3      3.0    unknown   0.0000       38
> 10.437155  10.577434         6      6.0    unknown   0.0000       14
> 10.347227  10.239314        11     11.0    unknown   0.0000       32
> 10.498632  10.650038        22     22.0    unknown   0.0000        2
> 10.495566  10.492500        44     44.0    unknown   0.0000      166
> 10.469184  10.442189        87     87.0    unknown   0.0000       29
> 10.068007  9.666830        174    174.0    unknown   0.0000       17
> 9.477440   8.886873        348    348.0    unknown   0.0000        2
> 9.020482   8.563524        696    696.0    unknown   0.0000      143
26c26
<  average loss = -1.275e+09
---
> average loss = 8.804
RunTests: test 9: FAILED: stderr(stderr.tmp) != ref(train-sets/ref/wiki1K.stderr):

seems to happen only when building on 32bit, ok 64bit looks fine:

RunTests: test 9: stdout OK
18c18
<  10.068007  9.666831        174    174.0    unknown   0.0000       17
---
> 10.068007  9.666830        174    174.0    unknown   0.0000       17
RunTests: test 9: minor (<0.0001) precision differences ignored
RunTests: test 9: stderr OK

JohnLangford commented 13 years ago

Updated.

Minor differences in floating point numbers should be expected, because we use -ffast-math when compiling.

-John

On Wed, Jun 29, 2011 at 7:45 PM, Matt Hoffman mdhoffma@cs.princeton.eduwrote:

Figured it out. (Sorry for the delay.)

In line 367 of lda_core.cc, we need to change float kl = -global.lda_mylgamma(global.lda_alpha); to float kl = -(global.lda_mylgamma(global.lda_alpha));

The issue seems to be that on 32-bit machines negating an unsigned int causes behavior that resembles underflow. Why this doesn't give us trouble on 64-bit machines is a bit of a puzzle, but one I'm basically happy to leave a mystery.

The test output still isn't identical to wiki1K.stderr, but it looks reasonable. (This happens on 64-bit machines too.) I'll look into what's going on there.

Matt

On Fri, Jun 3, 2011 at 5:52 PM, John Langford jl@hunch.net wrote:
Matt, it looks like there is some significant 32bit vs. 64bit weirdness in the LDA code. Do you know what it is?

-John

On 06/03/2011 05:06 PM, yarikoptic wrote:
I will email entire log in the email, but here is the relevant excerpt:
RunTests: test 9: stdout OK
12,20c12,20
<  -329633465.333333 -329633465.333333        3      3.0    unknown
0.0000       38
<  -595057498.000000 -860481530.666667        6      6.0    unknown
0.0000       14
<  -613376496.363636 -635359294.400000       11     11.0    unknown
0.0000       32
<  -1159652955.090909 -1705929413.818182       22     22.0    unknown
0.0000        2
<  -1017516755.340909 -875380555.590909       44     44.0    unknown
0.0000      166
<  -1043034866.712644 -1069146422.534884       87     87.0    unknown
0.0000       29
<  -879457718.436782 -715880570.160920      174    174.0    unknown
0.0000       17
<  -1082324919.413793 -1285192120.390805      348    348.0    unknown
0.0000        2
<  -1302478374.419540 -1522631829.425287      696    696.0    unknown
0.0000      143
---
>
> 10.296875  10.296875         3      3.0    unknown   0.0000       38
> 10.437155  10.577434         6      6.0    unknown   0.0000       14
> 10.347227  10.239314        11     11.0    unknown   0.0000       32
> 10.498632  10.650038        22     22.0    unknown   0.0000        2
> 10.495566  10.492500        44     44.0    unknown   0.0000      166
> 10.469184  10.442189        87     87.0    unknown   0.0000       29
> 10.068007  9.666830        174    174.0    unknown   0.0000       17
> 9.477440   8.886873        348    348.0    unknown   0.0000        2
> 9.020482   8.563524        696    696.0    unknown   0.0000      143

26c26
<  average loss = -1.275e+09
---
>
> average loss = 8.804

RunTests: test 9: FAILED: stderr(stderr.tmp) !=
ref(train-sets/ref/wiki1K.stderr):
seems to happen only when building on 32bit, ok 64bit looks fine:
RunTests: test 9: stdout OK
18c18
<  10.068007  9.666831        174    174.0    unknown   0.0000       17
---
>
> 10.068007  9.666830        174    174.0    unknown   0.0000       17

RunTests: test 9: minor (<0.0001) precision differences ignored
RunTests: test 9: stderr OK

gparker / vowpal_wabbit

consistently failing test 9 on 32 bit platform #4