Performance decreased after BO loops

martinakaduc commented 3 months ago

I tried to run a sequence design experiment with HES + 10-step lookahead on 10 sequences. I initialized 2000 sequences for training reward model. With 2000 sequences, we can achieve Rsquared ~ 0.5 with gemma7b embeddings.

The score for some initial BO loops are as follows. Note: all the scores below are computed through Oracle.

[
[0.4729124903678894, 1.8262120485305786, 2.0705394744873047, 1.8429815769195557, 1.3721145391464233, 2.4572482109069824, 2.0793373584747314, 2.4021151065826416, 2.0061886310577393, 2.111313819885254], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828], 

[-32.24431228637695, -31.407024383544922, -19.565031051635742, -34.94041442871094, -29.671096801757812, -30.855953216552734, -28.32752227783203, -32.225669860839844, -32.84722900390625, -28.363910675048828]
]

We can see that initial sequences (prior to BO loop) seem normal. But after the first BO loops, these sequences failed and got large negative scores.

I think there are two possible reasons:

The Reward model is not too good (as it has only ~0.5 Rsquared)
PPO training failed by random-created- prompts: https://github.com/sangttruong/nonmyopia/blob/d3ec3b1aaa6ceee8c93a6c4d7159d0b636b100f3/sequence_design/actor.py#L131-L174

Do you think what can be issues? @sangttruong

martinakaduc commented 3 months ago

Issues #54, #55, #56 could be the reasons of this issue.

martinakaduc commented 3 months ago

I experienced an unfamiliar situation. The model refuses to generate new protein sequence because it thinks that the sequence could not be improved anymore. Please see below sample.

So, in this case, we should only keep backpropagating until the last response having protein sequence. Is it correct?

'<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nEdit 1 amino acid in the below protein sequence to create a new protein with higher fluorescence. The amino acid must be in set {L, A, G, V, S, E, R, T, I, D, P, K, Q, N, F, Y, M, H, W, C, X, B, U, Z, O}. Please return only the modified protein sequence.\nGiven the following protein sequences and their corresponding fluorescence value.\nProtein: SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLPYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVSAAGITHGMDELYK - Fluorescence: 2.0793373584747314\nProtein sequence: SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLPYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVSAAGITHGMDELYK\nModified protein sequence: <|im_end|>\n<|im_start|>assistant\n', 

'SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLPYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEGDSSSVEGFVTGLAAYEKTGGPVGAVGATLSPVTTLVTTLPYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVSAAGITHGMDELYK\n\nThe modified protein sequence is:\n- L -> C (Change leucine to cysteine)\n- R -> D (Change arginine to lysine)<|im_end|>', 

'<|im_start|>user\nBased on above generated protein, please continue to modify it for higher fluorescence.<|im_end|>\n',

"<|im_start|>assistant\nTo increase the fluorescence of the given protein sequence, we can modify one of the amino acids by changing it to another amino acid that is known to have higher fluorescence. Here's an example modification:\n\n- Change the second last amino acid (L) to an amino acid with high fluorescence (E), resulting in the modified protein sequence: SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTEGFSDRKLPVPWPTLVTTLPYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVSAAGITHGMDELYK\n\nPlease note that this modification was made based on available information about amino acids and their potential for increased fluorescence. Further research may be required to confirm whether these modifications result in significant changes in fluorescence.<|im_end|>", 

'<|im_start|>user\nBased on above generated protein, please continue to modify it for higher fluorescence.<|im_end|>\n', 

'<|im_start|>assistant\nBased on the given protein sequence, we can see that there is no specific modification required to increase its fluorescence. Therefore, we cannot provide any further modifications to the protein sequence.<|im_end|>'

sangttruong / nonmyopia

Performance decreased after BO loops #52