Each sequence as an agent?

FeynmanDNA commented 4 years ago

Hi, awesome work on the gym environment! I am wondering is the training specific for each HP sequence?

Is the current design such that we have to start over the training, for example, train a new q-table for each new sequence?

ljvmiranda921 commented 4 years ago

Hi @FeynmanDNA thanks for checking out gym-lattice!

The default setup is that each generated sequence is an episode with sparse rewards (i.e., you only get the reward at the end). One or more agents can solve it:

If it's just one agent, then it will create a sequence for each episode. Ideally, you'd run multiple episodes across the whole training time (trajectory).
You can also do multiple agents using Batch PPO, IMPALA, or A2C. Each agent will then have its own trajectory consisting of multiple episodes (and consecutively, a generated sequence per episode).

It seems that this project's been two years ago so I'm not sure how updated it is in its domain. Unfortunately, I wasn't able to continue my research on this (but still keeping tabs on what's happening in this topic). The current SoTA in my opinion is DeepMind's AlphaFold. They do some level of feature engineering (distance and angle predictions) before running it on a classifier.

If you wish to know more information, I've written some posts regarding this project (some may be a bit dated):

On gym-lattice itself: https://ljvmiranda921.github.io/projects/2018/05/13/gym-lattice/
On multiagent setups: https://ljvmiranda921.github.io/projects/2018/09/14/pfn-internship/
On episodes and training steps: https://ljvmiranda921.github.io/notebook/2017/11/26/a-brief-soiree-with-reinforcement-learning/

FeynmanDNA commented 4 years ago

hi @ljvmiranda921 , I have read through your related writings on RL on your blog before. They are very educational and inspiring! I have always wondered were there any followup research from your Waseda lab-mates in this direction :)

Thank you so much for your sharing and pointers to other resources. I will check them out!

Regarding the "agent" and "generated sequence", the main concern I have is that since your observation_space is kind of tied to the sequence length:

self.grid_length = 2 * len(seq) + 1
...
self.observation_space = spaces.Box(low=-2, high=1,
    shape=(self.grid_length, self.grid_length),
    dtype=int)

So when the training is done for one sequence HHHPHH, the same RL q-table or DQN weights might not generalize to other sequences like HPPPHH?

My first impression on the gym-lattice design is that the sequence is in a way like the frozenLake map. The agent will learn to traverse through a particular frozenLake map, but will need to retrain for a different frozenLake map.

I am very new to RL, so please correct me if my understanding about the gym-lattice design is wrong.

Once again, awesome open-source project with solid tests and documentations. Thank you for your work!

ljvmiranda921 commented 4 years ago

Hi @FeynmanDNA , thanks for this!

So when the training is done for one sequence HHHPHH, the same RL q-table or DQN weights might not generalize to other sequences like HPPPHH?

I don't have enough information to confirm this, but back then I'm expecting that the same DQN weights should be able to generalize on sequences of the same length. I'm trying to remember why the design is like that, but my vague recollection is that I'm planning to compare it on the HP Lattice Protein Folding baselines (linked here is a recent paper related to this).

My first impression on the gym-lattice design is that the sequence is in a way like the frozenLake map. The agent will learn to traverse through a particular frozenLake map, but will need to retrain for a different frozenLake map.

You may be on to something, my hypothesis is that it should generalize if they're on the same length. Thus you can use the same agent to solve HPHH and HHPP (length=4), but not HHHPPPPHHPHPH(length!=4). Would be nice to confirm this experimentally!

FeynmanDNA commented 4 years ago

Thank you very much for your generous sharing!

I will try out some experiments with the current observation space design, and let you know if anything interesting comes up!

FeynmanDNA commented 1 year ago

@ljvmiranda921 Hi Miranda, thank you very much for open-sourcing this awesome repo! We built on your gym-lattice environment and implemented a prototypical DQN network with LSTM core that is able to achieve better results than previous RL/DRL studies.

The poster and extended abstract are accepted at Machine Learning and the Physical Sciences workshop, NeurIPS 2022

Poster and Extended Abstract

The full manuscript is under review, and preprint is on arxiv Arxiv @ https://arxiv.org/abs/2211.14939

Once again, thank you very much for the gym-lattice environment and your blog posts on HP-model :)

ljvmiranda921 / gym-lattice

Each sequence as an agent? #21