ivegner / PyDSRL

Faithful Python implementation of the paper "Towards Deep Symbolic Reinforcement Learning" by Garnelo et al.
MIT License
13 stars 5 forks source link

Understanding Problems / Code mistakes #6

Open NikeHop opened 3 years ago

NikeHop commented 3 years ago

Hey,

thanks for providing a base implementation for the method presented in "Towards Deep Symbolic Reinforcement Learning".

With respect to your implementation, I have the following comments/ questions:

state_builder.py:

(1) line 177: We should check the absolute difference between x.position and entity.position otherwise x might not be in the radius but still part of the list when the difference is negative.

(2) line 143-145: In the deletion of the not-tracked entities. This is only a problem if the object tracking messes up and the number of objects seen changes. Since the entities to be deleted are from the last state their id in the self.tracked_entities list might have changed. Lets say you track 15 entities and in your current do_not_exist you have index 10 which you delete and in the current timestep you add 15 to the newlynonexistent list, and no new entities are added in the next timestep this loop will throw an error.

agent.py

(1) The update function of the Tabular Agent. If I am not completley mistaken, I think there is a mistake in the paper in the update equation of the tabular Q-function. At least I do not see a reason why the action value of (s_t,a_t) should also be discounted. Additionally in your calculation of the current and next step action value you always sum over all the interactions present at the current timestep while only updating the Q-table for the specific type of interaction as described in the paper. Is there a reasoning behind it ?

autoencoder.py

(1) type consistency: In the current get_entities function of the autoencoder the representative activations are always determined newly for a new timestep, which can lead to inconsistencies across timesteps. For example, when there is an entity at the top left corner with a specific type it gets assigned type-0. If now the agent collects this entity after a certain number of timesteps the agent gets assigned type 0 which I think is undesired behaviour because it makes the matching of tacked_entities and new entities harder in the build representation function.

If I change the points mentioned above the implementation runs without problems. I hope these comments can help to fix your implementation, if I misunderstood some part let me know.

Since the paper itself provides very little information on how the method was actually implemented I was wondering whether you already contacted the authors and got some additional information about the implementation that is not part of the paper.

All the best,

N

JunbZhang commented 3 years ago

Could your code run smoothly? I will be very grateful if you could share your code. My email:junbzhang@stu.xidian.edu.cn

ivegner commented 3 years ago

Hey, thanks so much for looking into the problems. This codebase is several years old and was written when I was much less experienced, so some of the issues you mention definitely make sense. Please PR your changes and I’ll be happy to integrate them into the repo!

NikeHop commented 3 years ago

Yes, will do the PR. Was by no means criticism the code was very neat and good to read. I also contacted the authors because I am still uncertain how certain things described in the paper were implemented. So if I find out that some things were done differently I will maybe do another PR. I will also look for hyperparameter settings that let us replicate the results from the paper, so that we can maybe give them as default settings and add the corresponding plots in the ReadMe.

ivegner commented 3 years ago

Thank you, I look forward to seeing your changes!

NikeHop commented 3 years ago

Ok, did the major changes. The code should run fine. This week I am gonna try to replicate their results and if successful send u the default settings and plots so you can maybe add them to the ReadMe.