Open andreasbinder opened 1 year ago
In your paper, you mentioned using an improved version of the REINFORCE algorithm [32] to directly train the ToT (Tree of Thought) controller and the prompt agent. However, in the code of your GitHub project, I did not find the corresponding reinforcement learning method. It seems that your strategy still relies on natural language prompts.
Hi, thank you for the paper and the interesting concept!
We want to build on your idea, using RL methods. However, in your code I did not find the policy implementations. I assume they are supposed to be here.
It would be great if you can share your code so that I can experiment also on my own :) Keep up the good work!