While working on #48 I came to the conclusion that modern RL algorithms might be overkill for my type of problem. I went back to the tabular solving approach kicked-off in #46. I came up with a new solving algorithm that is similar to value iteration but
samples exploration paths from a dynamic environment
builds the tabular state space on the fly
does dynamic programming state-value updates in the meantime
According to Sutton and Barto book on RL, this falls into the broad category of "Asynchronous Dynamic Programming". After some googling, I think I've implemented Real-Time Dynamic Programming.
The results seem promising. I can now handle a non-truncated hence infinite state space instance of the generic DAG model for Nakamoto/Bitcoin.
While working on #48 I came to the conclusion that modern RL algorithms might be overkill for my type of problem. I went back to the tabular solving approach kicked-off in #46. I came up with a new solving algorithm that is similar to value iteration but
According to Sutton and Barto book on RL, this falls into the broad category of "Asynchronous Dynamic Programming". After some googling, I think I've implemented Real-Time Dynamic Programming.
The results seem promising. I can now handle a non-truncated hence infinite state space instance of the generic DAG model for Nakamoto/Bitcoin.