Closed loomlike closed 5 years ago
we are using seed in the algos: https://github.com/Microsoft/Recommenders/blob/master/reco_utils/recommender/rbm/rbm.py#L74. I'm not sure if there is anyone missing.
When trying to address reproducibility, @anargyri @msalvaris and I had some fun trying to achieve it. In DL is very difficult, since the optimization is stochastic and there are some asynchronous processes in the GPU that make difficult to have complete reproducibility.
hmmm seems like there are many threads discussing about non-deterministic results from tf out there. In fastai, you can set several seeds to make the results reproducible.
Have you tried no-multi-threading with all the seeds set including tf.random.set_random_seed
? Still no luck?
yeah we have that function in all the DL algos: https://github.com/Microsoft/Recommenders/search?q=tf.random.set_random_seed&unscoped_q=tf.random.set_random_seed
see this, just yesterday we got an error in the nightly builds for xdeepfm: https://msdata.visualstudio.com/DefaultCollection/AlgorithmsAndDataScience/_build/results?buildId=2873512
yeah... we should handle this randomness, but I couldn't find time to work on this issue. If this is urgent thing, can anybody take this item and fix? If not, we can temporally loose the tolerance value for NN algos and I will take care in the next few weeks.
adding more folks to discuss this @anargyri @yueguoguo @gramhagen. I agree with Jun Ki, I think it is a very annoying issue. Many times our GPU nightly builds fails because of this
One example of a nightly build that failed:
tests/smoke/test_deeprec_model.py .. [ 25%]
tests/smoke/test_notebooks_gpu.py ...F.. [100%]
=================================== FAILURES ===================================
____________________________ test_notebook_xdeepfm _____________________________
notebooks = {'als_deep_dive': '/data/home/recocat/cicd/17/s/notebooks/02_model/als_deep_dive.ipynb', 'als_pyspark': '/data/home/re...aseline_deep_dive.ipynb', 'data_split': '/data/home/recocat/cicd/17/s/notebooks/01_prepare_data/data_split.ipynb', ...}
@pytest.mark.smoke
@pytest.mark.gpu
@pytest.mark.deeprec
def test_notebook_xdeepfm(notebooks):
notebook_path = notebooks["xdeepfm_quickstart"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
kernel_name=KERNEL_NAME,
parameters=dict(
EPOCHS_FOR_SYNTHETIC_RUN=20,
EPOCHS_FOR_CRITEO_RUN=1,
BATCH_SIZE_SYNTHETIC=128,
BATCH_SIZE_CRITEO=2048,
),
)
results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
assert results["res_syn"]["auc"] == pytest.approx(0.982, rel=TOL, abs=ABS_TOL)
> assert results["res_syn"]["logloss"] == pytest.approx(0.2306, rel=TOL, abs=ABS_TOL)
E assert 0.103 == 0.2306 ± 1.2e-01
E + where 0.2306 ± 1.2e-01 = <function approx at 0x7f8fa788b840>(0.2306, rel=0.5, abs=0.05)
E + where <function approx at 0x7f8fa788b840> = pytest.approx
I think in this case it is safe to widen the tolerances because in the smoke tests we are doing a small number of iterations, so it's normal that the metrics change a lot. Probably in the integration tests we can be more strict, because there we have more iterations and the model should converge to certain metrics
Description
Use seed for NN-based models - both at notebooks and tests
Expected behavior with the suggested feature
Produce same results from notebooks Assert exact value from tests