DQN report. [QUESTION] - Githubissues

Introduction My model acts like a compulsive masochist. Great beginning, innit? I will attach my parameters a bit further in a text but don't strictly orient on them bc I'm changing them all the time because of the following:

Describe the bug I have a very simple ping-pong env (custom one, not gym) and I sat up an agent without any issues except, probably, one. Potential problem is in the reward system but nevertheless it shouldn't act like it does. My reward system bases on a simple

if not done:
     reward = 1
 else:
     reward = 0

and probably he should try to get as many reward points as possible and it does so but only in the first 10k steps. Neither of the parameters affects on this occasion. Ofc hyperparams changes its performance but nothing more. After 10k it starts to dodge a ball but sometimes it gets about 5-10 points but dodges a 100 episodes afterwards. Code example I will throw everything important (imo) in a single logical sequence but i can invite in repo if needed. rew_mean looks like this. As you can see, it smashes after 10k. Btw, after learning starts parameter it smashes even lower and I don't know how's that even possible. Here's one more graph.

framebuffer = 5
learning_rate = 0.0001
total_timesteps = 10000000 # something like the infinity. I have a callback each 5k steps.
env = PingPongEnv()
env = DummyVecEnv([lambda: env])
env = VecTransposeImage(env)
env = VecFrameStack(env, n_stack=framebuffer)

model = DQN('CnnPolicy', env, verbose=1, tau=0.001, tensorboard_log=LOG_DIR, 
                            learning_rate=learning_rate, buffer_size=10000, learning_starts=100000, 
                            train_freq=1000, target_update_interval=20000, exploration_inital_eps=1, 
                            exploration_final_eps=0.00001, exploration fraction=0.001)

System Info Describe the characteristic of your environment:

As far as I'm using hell lotta libraries for a single purpose, I can't write about each and every, but globally I'm using conda when available and pip when conda is unable to find required packages
I have a singe GPU. GTX 1060 6G but it's utilized about 10-15% and mem usage is around 3-4 gigs. Ram is also not overfitted as well as cpu and disks (just in case).
Python 3.10.10, conda = 23.1.0, latest at the moment.
I'm not using tensorflow so it's not even installed. pytorch is 2.0.0, latest stable at the moment.
Stable_baselines3 v1.7.0, pytorch-cuda v11.8, gym v0.21.0.

Additional context Ping-pong is written on arcade by my brother but I'm not sure if it's useful info bc I'm not diving into his code, I use direct input instead. I use win32gui to grab images but it gives back about 150-200 images per second so its definitely not the problem.

hill-a / stable-baselines

DQN report. [QUESTION] #1180