ievapudz / TemStaPro

TemStaPro - a program for protein thermostability prediction using sequence representations from a protein language model.
MIT License
46 stars 9 forks source link

How to represent Whole mAb in input FASTA #6

Closed tony-res closed 1 year ago

tony-res commented 1 year ago

Thanks for the tool. I've been playing with it quite a bit.

I am currently checking the accuracy against the TheraSAb_Dab database of therapeutic antibodies. For something like tovetumab I am using this as the input FASTA file to TemStaPro:

>Tovetumab Heavy Chain
QVQLVESGGGLVKPGGSLRLSCAASGFTFSDYYMNWIRQAPGKGLEWVSYISSSGSIIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAREGRIAARGMDVWGQGTTVTVSS

>Tovetumab Light Chain
DIQMTQSPSSLSASVGDRVSITCRPSQSFSRYINWYQQKPGKAPKLLIHAASSLVGGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQTYSNPPITFGQGTRLEMK

It's predicting a temp of < 40 C, but the measured Tm is 63.5 C.

Am I doing anything wrong? Maybe it needs more than the VH and VL regions?

Thanks again for publishing TemStaPro!

ievapudz commented 1 year ago

Hello,

Thank you for the question.

Since the tool was not tested specifically with antibody sequences, it is hard to tell quickly, why predictions are not accurate. In general, it could be that there are few antibody sequences in the training data, thus the method does not generalize well in these cases.

tony-res commented 1 year ago

Thank you. That makes sense.