Open AhdHazim opened 6 days ago
Hi, @AhdHazim
I apologize for the delayed response and If I'm not wrong it seems like you're not using TensorFlow.js library and using core TensorFlow so I would suggest you to please follow this official example shows how to train a DQN (Deep Q Networks) agent on the Cartpole environment using the TF-Agents library.
You can consider below points for DQN learning :
Loss Function: The code uses the mean squared error (MSE) loss. Huber loss is sometimes preferred for DQN as it can be less sensitive to outliers.
Training Batch Size: The batch size might be too small or too large for your specific scenario. Experiment with different values.
Hyperparameter Tuning: The chosen hyperparameters like learning rate, epsilon decay, memory size etc.. might not be suitable for your environment. Try experimenting with different values to see if it improves learning.
By checking these points and potentially modifying the code you can improve the learning behavior of your DQN agent. Remember that training DQN agents can be an iterative process and may require adjustments based on your specific environment and task.
If this issue is not specific to TensorFlow.js then I would request you to please post this issue in core TensorFlow repo here if your issue did not solve after trying above mentioned points.
Thank you for your cooperation and patience.
Hi everyone,
I am using the following DQN agent and it does not learning. Could you please let me know if i missed something.
Here is the DQN code:
`import numpy as np import tensorflow as tf import json import os import math
class DQN: def init(self, n_actions, n_features, lr= 0.001, reward_decay=0.9, e_greedy=0.9, epsilon_min=0.01, replace_target_iter=300, memory_size=10000, batch_size=32, e_greedy_decay=1e-5): self.n_actions = n_actions self.n_features = n_features self.lr = lr epsilon_max=0.9 self.gamma = reward_decay self.epsilon_decay = e_greedy_decay self.epsilon_min = epsilon_min self.replace_target_iter = replace_target_iter self.memory_size = memory_size self.batch_size = batch_size self.learn_step_counter = 0 self.memory = np.zeros((self.memory_size, n_features * 2 + 2)) self.loss_history = [] self.reward_history = [] self.epsilon = 0 if self.epsilon_decay is not None else self.epsilon_max
Here is the averge reward result:![final DQN results](https://github.com/tensorflow/tfjs/assets/149252191/d135e1bb-07c1-4920-9ec7-cbb38a3b0234)