Closed RaviKumarAndroid closed 4 years ago
To reproduce this issue, just run FQF with torch 1.5.0
Hi, @RaviKumarAndroid
Thank you for sharing!! I quickly fixed the issue, so could you try it??
The cause of the problem is the entropy loss, which depends on the fraction network, and networks other than the fraction network should never be trained on entropy loss. (IQN considers fractions as out of its control.)
Actually, because I set entropy_coef=0.0
, this bug wouldn't affect the performance of my experiment.
Anyway, thank you so much for your advice :)
It works now. What do you think would be the results if we added a sequential memory and LSTM cells to the DQNBase network? Similar to DRQN (Deep Recurrent Q Network) But Algorithm changed to FQF
I believe it has the potential to improve the result of FQF Further on Atari for Partially Observable MDP's. You can think of it as a combination of FQF, R2D2, Rainbow Its already been tried on IQN's (https://opherlieber.github.io/rl/2019/09/22/recurrent_iqn) I am working on trying it on FQF (Recurrent FQF)
Thanks for the fix :)
Cool ! I think it will work well with FQF, although I'm not sure why LSTM works so better. (many of Atari games with frame stack seem no longer POMDP.)
I hope it works well :)
i think LSTM have something to do with being able to better learn and generalize transition probabilities rather than stacked frames. although i could be wrong. Because some additional linear features are fed to the LSTM (Though it’s not clear how much if at all these help): One-hot encoded last action Last reward (clipped) Timestep (Scaled to [0,1] range)
It kinda teaches it about how previous action affects current state (transition probability)
Thanks for sharing !
I see. I think especially rewards should be the good indicator of how good this episode is, which is informative for the agent. I want to see how reccurent-fqf works, so could you share your results with me when finished??
Please feel free to ask me if you have questions or need a help.
I am using pytorch 1.5.0 and i am getting error "one of the variables needed for gradient computation has been modified by an inplace operation".
When i enabled torch anomaly detection to find which tensor was being modified inplace. I got error at line : https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/542a6e57cdbc8c467495215c5348800942037bfa/fqf_iqn_qrdqn/network.py#L71
Note: It works when i downgraded to pytorch 1.4.0 I am unable to find where is the issue to make it work on torch 1.5.0