Closed Akazhiel closed 2 years ago
Hi @Akazhiel ,
Thanks for your interest!
One selection step that you can do before encoding is to run all pMHCs through netMHCpan and only keep the pMHCs with a satisfactory rank (e.g. 2%). Then input the pMHCs with TCRs to pMTnet. We are also working on computationally speeding up the encoding process.
Best, Tianshi
Hello @tianshilu ,
Yes we do run the pMHCs through an algorithm different than netMHCpan and filter them by the affinity percentile. My question was more towards how (if possible) to reduce the number of candidates TCRs. Since you'd want to screen each TCR against all the pMHCs.
Cheers,
Jonatan
Hi @Akazhiel ,
Sorry that we don't have a pre-selection step for TCRs. We are working on speeding up the encoding and prediction. Thanks very much for your feed back!!
Tianshi
Hi @tianshilu
That's totally understandable, indeed subsetting the TCRs might be a really hard feat to achieve. I've been tinkering with the code and sped up the encoding steps that take place previous to the encoding with the autoencoder since my knowledge and capabilities regarding machine learning are pretty limited and wouldn't know how to speed up the autoencoder or the predictions.
If it's okay with you I'll open a pull request so that you can review the code. I've done some testing and the TCRmap
, antigenMap
and HLAMap
together take less than one minute for a dataset of 2M rows, the bottleneck of the software now for large datasets is as I've mentioned the prediction step since it needs to loop through each value.
Cheers,
Jonatan
Hi @Akazhiel ,
Thanks for your effort on this. Please feel free to open a pull request!
Thanks!
Tianshi
The encoding part has been updated for faster encoding speed.
Greetings!
Great tool to help predict the TCR-pMHC bindings although, is there any way to speed up the encoding step? Since I understand the aim of this tool is to predict how well your TCR repertoire binds to the predicted pMHCs, the encoding is far slower than what I'd expect. Given you'd pair each TCR to the whole list of pMHCs to test for binding, this would generate files of millions of lines. Currently I'm running it on a file with 2M lines and it's been almost 3 days of running time and the encoding is not even close to be done. Maybe it's not expected to use as input all the possible combinations but just some of them? In that case how would you select them?
Best regards,
Jonatan