keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.97k stars 19.47k forks source link

Model training entirely different results on AWS instance vs local instance #12014

Closed daroodar closed 3 years ago

daroodar commented 5 years ago

I am an encountering an issue with the training of a keras neural network (TensorFlow backend).

Training the same keras model with same data inputs on two different machines (one AWS instance and one local laptop) results in two entirely different models. These models are being used for the purpose of regression and after training in the wrong machine (AWS instance), the model predicted totally flat outputs (whereas there was no such data in training).

I was able to reproduce the issue on AWS t2.medium and c5.large. It was running correctly on c5.xlarge

Python version: Python 2.7.15rc1

The specs of two machines are:

Machine A Specs (model working fine): Model: Lenovo ThinkPad E570 product: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz OS: Ubuntu 18.04.1 LTS RAM: 16 GB

Machine B Specs (model working wrong): AWS t2.medium and c5.large ((using Amazon Linux 2 AMI)

Please note that the same code was tried on AWS instance c5.xlarge and was running correctly.

Code which reproduces problem: https://github.com/daroodar/minimum_reproducible_example_keras_issue. This code works fine in local machine and produces problems in AWS instances.

Libraries versions on both machines are same and are attached as: libraries_versions.txt

Log from Machine A (in which model works fine) is attached as correct_model.log Log from Machine B (in which model works wrong) is attached as incorrect_model.log

Data input for both the codes is same and is attached in the GitHub code.

correct_model.log incorrect_model.log libraries_versions.txt

msymp commented 5 years ago

Hello @pavithrasv , has the Keras implementation of a TensorFlow backend on AWS been tested? This user finds it doesn't work as expected. Thanks.