rmattson1008 / dqn_asteroid_probe

Probing various layers of DQN to assess "difficulty" of decisions.
MIT License
0 stars 0 forks source link

Hooking and saving process. #2

Open rmattson1008 opened 1 year ago

rmattson1008 commented 1 year ago

You will need to be very organized here, or you will spend all week backtracking. The hooks are pretty straightforward (a dict with a key for each hidden layer). Saving is where I tend to get messy.

Make sure that you keep the probe set separate from the dqn set. For DQN, we have a train/val split. For the separate probe set, we have a train and test split (or possibly some cross validation?) The training of the probes is less important as long as the splits are maintained and everything is kept very consistent.

rmattson1008 commented 1 year ago

Seems like the easiest way to do this without messing with any dqn procedures is to write embeddings straight to a file in the forward loop. Would need a state variable to toggle whether or not I wish to write. (Only turn on for the target dqn, when passing probe set through model)

rmattson1008 commented 1 year ago

A note on probe set - will need to pass each train through the probe set to get "final" answer. (So that we are not working with ground truth. Unless I can get optimal action from gym?) I don't think thats the point of RL

rmattson1008 commented 1 year ago

Hooks are live, but batching is unfamiliar. Check.