Closed rushnaulaziz closed 2 years ago
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
Please go to Stack Overflow for help and support:
https://stackoverflow.com/questions/tagged/keras
If you open a GitHub issue, here is our policy:
Here's why we have that policy: Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with:
Describe the problem
I have created a classification model using Keras 2.4.3, tensorflow-cpu 2.5.0 and Python 3.9.5. The model works fine in on my Windows 10 development environment.
However, when I deploy my code in a Docker container, the code gets stuck. Specifically, it gets stuck when I add a LSTM (Long Short-term Memory) object in the Sequential model.
The LSTM is the first layer I add so the code gets stuck right at the start. To be clear, this works fine when I do not use a container and deploy directly on my Windows 10 laptop.
This behaviour is random (has happened 4th and 20th time I ran the model and any number of times in between). The training is running in separate process which is created using the standard Python multiprocessing module. Even when my model I stuck, I do not see anything out of the ordinary when I run docker ps.
Source code / logs
Model Structure
Model fit:
Fastapi websocket code snippet for training model:
Dockerfile
Docker Container logs:
Result of docker stats [container_name]:
Results of docker top [container_name]:
Logs on development environment
Steps to reproduce:
train model 20-40 times to reproduce the error, for saving time use small dataset
Environment information