pytorch / glow

Compiler for Neural Network hardware accelerators
Apache License 2.0
3.23k stars 691 forks source link

[Unittest/CI] RecSys seems to be flaky #3026

Closed rdzhabarov closed 5 years ago

rdzhabarov commented 5 years ago
40/40 Test #32: RecommendationSystemTest ............***Exception: SegFault 29.76 sec

...

RecSys/RecommendationSystemTest.RecSys_SLS_Only/0
shape: ( 16 64 )
max: 1.958  min: -1.782
[[0.393, 0.298, 0.482, -0.038, -0.735, 0.584, -0.321, 0.485, 0.646, -0.790, -0.335, -0.875, -1.289, -0.501, -0.713, -1.070, 0.107, 0.020, 0.253, 0.270, -0.250, 0.217, -0.058, -0.624, -0.659, 0.081, 0.579, 1.167, -0.983, 0.705, 0.806, -0.297, -1.027, -0.093, -1.055, 0.334, 0.875, 0.253, -0.185, 0.410, 0.132, 0.216, -0.245, -0.396, 0.295, -1.601, -0.231, 1.309, 0.123, 1.169, -1.245, -0.082, 0.169, 0.185, -0.455, 0.751, -0.374, -1.285, -0.340, 0.489, 0.086, 1.373, 0.440, 0.790], 
[-0.395, 0.599, 0.104, -0.848, 0.429, 0.633, 0.076, 0.668, -0.651, 0.487, -0.311, 0.229, 1.047, -0.312, 0.796, 0.484, 0.880, -0.860, -0.561, -0.124, 0.710, -0.331, -0.855, -0.508, 0.989, -0.276, -0.329, -0.632, 0.022, -0.017, 0.235, 0.144, 0.447, 1.029, -0.597, -0.244, ...]
[       OK ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/0 (77 ms)
rdzhabarov commented 5 years ago

@gcatron is this something that was fixed by your change in recsys test?

gcatron commented 5 years ago

I don't think so, my fix was correcting an integer overflow which only happened in ASAN.

bertmaher commented 5 years ago

Yeah, it's still pretty flaky. The issues seem to be in the new-ish tests that I ported from the Habana branch.

rdzhabarov commented 5 years ago

Ok, i'll see if i can reproduce running it 100 times.