In the naive MCMC sampling process, we made thousands of samples but did not acquire any knowledge on how to make better samples along the way. It would improve the sampling efficiency if we are able to learn how to make better samples during the search process.
Alternative solution: better sample machenism
The naive MCMC sampler randomly pick one tract to flip. The solution is simple, while not efficient. One alternative sampler method could be find the CA with the highest prediction error, and then randomly pick one tract within to flip. Currently, this is on-going on parrallel.
Proposed solution in this thread: reinforcement learning scheme
Instead of design complicated sampling method from human heuristics as shown above, we can also automatically learn the sampling method. The reinforcement learning technique could be used toward this goal.
Motivation: adaptive sampling
One straightforward solution to enhance the MCMC sampler is adaptive sampling, where we control the sample probability of one tract being selected for the label flip action. However, this adaptive process is non-trivial. The selection probability is determined together by current CA assignments and the tract to flip, whose state space is finite but huge. Therefore it is impossible to maintain a table of probability to update.
Q-learning
A nature solution to tract the reward of a large number of states is Q-learning. Namely, we use reinforcement learning to model the adaptive process.
Reinforcement formulation
Objective is to find a policy which lead to optimal partition of all tracts into CAs.
State is one configuration (partition).
Action is to select one tract and flip its CA assignment.
Reward is the delta of training error for two configurations (before and after an action).
Policy is the function tells us how to select tract to flip. Namely, Q(S, A).
Motivation
In the naive MCMC sampling process, we made thousands of samples but did not acquire any knowledge on how to make better samples along the way. It would improve the sampling efficiency if we are able to learn how to make better samples during the search process.
Alternative solution: better sample machenism
The naive MCMC sampler randomly pick one tract to flip. The solution is simple, while not efficient. One alternative sampler method could be find the CA with the highest prediction error, and then randomly pick one tract within to flip. Currently, this is on-going on parrallel.
Proposed solution in this thread: reinforcement learning scheme
Instead of design complicated sampling method from human heuristics as shown above, we can also automatically learn the sampling method. The reinforcement learning technique could be used toward this goal.
Motivation: adaptive sampling
One straightforward solution to enhance the MCMC sampler is adaptive sampling, where we control the sample probability of one tract being selected for the label flip action. However, this adaptive process is non-trivial. The selection probability is determined together by current CA assignments and the tract to flip, whose state space is finite but huge. Therefore it is impossible to maintain a table of probability to update.
Q-learning
A nature solution to tract the reward of a large number of states is Q-learning. Namely, we use reinforcement learning to model the adaptive process.
Reinforcement formulation