pytorch / glow

Compiler for Neural Network hardware accelerators
Apache License 2.0
3.2k stars 688 forks source link

RecommendationSystemTest should produce non-zero reference results #3012

Open opti-mix opened 5 years ago

opti-mix commented 5 years ago

RecommendationSystemTest produces zero tensors as reference results in many cases (see below). This is not very useful for comparing the results of a reference run with results produced e.g. by different backends or by partitioned execution. The test should be improved to produce more meaningful, non-zero results.

[==========] Running 39 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 39 tests from RecSys/RecommendationSystemTest
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32/0 (2097 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32/1 (117 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC/0 (4431 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
max: 0.002  min: 0.000
[[0.001],
[0.000],
[0.002],
[0.000],
[0.000],
[0.001],
[0.001],
[0.001],
[0.000],
[0.001],
[0.001],
[0.000],
[0.000],
[0.000],
[0.000],
[0.000],
]
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS/0 (2574 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC_FP16/0
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC_FP16/0 (1 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC_FP16/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC_FP16/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC_FP16/2
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FC_FP16/2 (19888 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FP16/0
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FP16/0 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FP16/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FP16/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FP16/2
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
max: 0.002  min: 0.000
[[0.001],
[0.000],
[0.002],
[0.000],
[0.000],
[0.001],
[0.001],
[0.001],
[0.000],
[0.001],
[0.001],
[0.000],
[0.000],
[0.000],
[0.000],
[0.000],
]
[       OK ] RecSys/RecommendationSystemTest.RecSys_RWQuantized_SLWS_FP16/2 (26847 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Partitioned/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
6 devices of size 12582912
Partitions = 5
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Partitioned/0 (5277 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Partitioned/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Partitioned/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Partitioned/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Partitioned/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
max: 0.002  min: 0.000
[[0.001],
[0.000],
[0.002],
[0.000],
[0.000],
[0.001],
[0.001],
[0.001],
[0.000],
[0.001],
[0.001],
[0.000],
[0.000],
[0.000],
[0.000],
[0.000],
]
6 devices of size 12582912
Partitions = 4
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS/0 (4355 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
6 devices of size 6291456
Partitions = 3
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC/0 (5456 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FP16/0
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FP16/0 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FP16/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FP16/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FP16/2
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
max: 0.002  min: 0.000
[[0.001],
[0.000],
[0.002],
[0.000],
[0.000],
[0.001],
[0.001],
[0.001],
[0.000],
[0.001],
[0.001],
[0.000],
[0.000],
[0.000],
[0.000],
[0.000],
]
6 devices of size 6291456
Partitions = 4
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FP16/2 (71435 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC_FP16/0
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC_FP16/0 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC_FP16/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC_FP16/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC_FP16/2
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
6 devices of size 6291456
Partitions = 3
[       OK ] RecSys/RecommendationSystemTest.RecSys_Partitioned_RWQuantized_SLWS_FC_FP16/2 (40087 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/0
shape: ( 16 64 )
max: 1.785  min: -1.848
[[0.285, 0.547, 0.705, 0.076, -0.642, 0.573, -0.763, -0.002, 0.749, 0.242, -1.494, -0.282, 0.374, -0.179, 0.302, 0.519, -0.425, 0.348, -0.775, -0.622, -0.101, 0.744, -0.341, -0.408, 0.095, -1.234, 0.631, -0.223, 0.333, 0.347, 0.115, 0.464, 0.420, 0.057, 0.098, 0.218, -0.929, -0.262, -0.380, -0.963, -0.464, -0.045, -1.011, 0.327, -0.376, 0.250, -0.004, 0.249, 0.530, 0.160, -0.155, 0.737, 0.500, 0.959, -1.620, -0.658, -0.696, 0.885, -0.200, 1.104, -0.371, -0.296, 0.103, 0.431],
[-0.301, -0.480, -0.454, -0.523, 0.438, -0.516, -0.105, 0.511, -0.335, 0.563, -0.973, -0.018, 0.429, 0.580, -0.018, -1.848, 0.145, -0.793, -0.553, 0.087, 0.158, 0.775, -0.239, -0.098, -0.178, -0.128, -0.370, 0.571, -1.720, -1.039, -0.571, 0.580, -0.005, 0.386, -0.772, -0.275, ...]
[       OK ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/0 (157 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_SLS_Only/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Gather_Weights/0
Number of embeddings concatenated: 11
Reference results:
shape: ( 16 1 )
[ Zero tensor ]
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Gather_Weights/0 (2159 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Gather_Weights/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Gather_Weights/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Gather_Weights/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Gather_Weights/2 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Medium_Gather_Weights/0
Number of embeddings concatenated: 16
Reference results:
shape: ( 16 1 )
max: 107.499  min: 68.324
[[97.578],
[78.288],
[91.872],
[78.743],
[94.955],
[75.585],
[99.873],
[77.716],
[92.638],
[74.758],
[85.918],
[68.324],
[84.593],
[107.499],
[85.433],
[104.986],
]
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Medium_Gather_Weights/0 (62651 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Medium_Gather_Weights/1
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Medium_Gather_Weights/1 (0 ms)
[ RUN      ] RecSys/RecommendationSystemTest.RecSys_FP32_Medium_Gather_Weights/2
[       OK ] RecSys/RecommendationSystemTest.RecSys_FP32_Medium_Gather_Weights/2 (0 ms)
[----------] 39 tests from RecSys/RecommendationSystemTest (247532 ms total)

[----------] Global test environment tear-down
[==========] 39 tests from 1 test case ran. (247532 ms total)
[  PASSED  ] 39 tests.
opti-mix commented 5 years ago

cc @nickgg

opti-mix commented 5 years ago

@artemrakhov is currently looking into this and will report his findings here.

artemrakhov-glow commented 5 years ago

I debugged this and figured out that top-level MLP is at fault, results after concat are non-0. I created PR #3020 to fix float version of the MLP, but didn't figure out why we are getting 0 from rowwise-quantized MLP. Similar fix doesn't help there, need to debug more.

nickgg commented 5 years ago

@artemrakhov can we close this now?

artemrakhov-glow commented 5 years ago

I only fixed FP version of the test. If MLP is quantized, I still see 0s in the output. Need to debug more.