Results not reproducible with fixed random seed.

amordahl commented 4 years ago

While training the VarMisuse task with GGNN model, I have noticed that results are not reproducible even with the random seed fixed.

What I have done: 1) Fixed the random seed in VarMisuse_GGNN.json to any positive value (e.g., 1252). 2) Trained two models using python train.py GGNN varmisuse --data-path <my_data_path> --result-dir <my_result_dir> What I expect: With the same random seed, the results of training on the same dataset should be the exact same, reflected in the log files generated in the result dir.

What I get instead: Different results, reflected in different training and validation accuracies and different numbers of epochs trained.

I was wondering whether this is an implementation error, or if there is a reason that fixing the random seed does not make results reproducible that I am overlooking.

Thank you!

mmjb commented 4 years ago

Thanks for the report. I never realised this, but the VarMisuse task is special because it has an additional source of randomness, namely the parallel loading of data in https://github.com/microsoft/tf-gnn-samples/blob/master/tasks/varmisuse_task.py#L190. I never checked this explicitly, but this would explain a difference in results, as the training samples will be in a different order after data loading.

I'd suggest to check if this is indeed the source of the problem by flipping no_parallel in https://github.com/microsoft/tf-gnn-samples/blob/master/tasks/varmisuse_task.py#L168 to True.

If that resolves the issue, a proper solution would require to change _load_data to keep the order stable. The code is supporting the streaming case (because I stole it from another of my projects), but here the result just gets fed into a list on the consumer side, so it could be simplified to just use multiprocessing.Pool.map, which keeps the order stable.

amordahl commented 4 years ago

Hi Marc! Thanks so much for the quick response. Unfortunately, even after flipping no_parallel to True, I still get different results. I've attached the run logs of two models I trained earlier today after making the change you indicated.

VarMisuse_GGNN_2020-08-13-10-17-26_32199.log VarMisuse_GGNN_2020-08-13-10-52-02_32452.log

microsoft / tf-gnn-samples

Results not reproducible with fixed random seed. #12