maovshao / PLMSearch

PLMSearch enables accurate and fast homologous protein search with only sequences as input
https://dmiip.sjtu.edu.cn/PLMSearch
MIT License
58 stars 8 forks source link
homologous-protein-search

PLMSearch

This is the implement of "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". By using a protein language model, PLMSearch can achieve a sensitivity close to SOAT structure search methods while being versatile and fast because it is only based on sequences.

Quick links

Webserver

PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀

PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign :airplane:

PLMAlign source code : github.com/maovshao/PLMAlign :helicopter:

Requirements

Follow the steps in requirements.sh

Data preparation

We have released our experiment data, which can be downloaded from plmsearch_data or Zenodo.

# Include experiment data, PLMSearch model, ESM-1b model, etc.
# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMSearch/static/download/plmsearch_data.tar.gz  
tar zxvf plmsearch_data.tar.gz

Reproduce all our experiments with only one file

Notice: Detailed results are saved in scientist_figures/.

Run PLMSearch locally

Notice: the inputs and outputs of the example are saved in example/.

Citation

Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5