labstructbioinf / pLM-BLAST

Detection of remote homology by comparison of protein language model representations
https://toolkit.tuebingen.mpg.de/tools/plmblast
MIT License
45 stars 5 forks source link

The output results are incomplete #58

Open daisykuma22 opened 2 weeks ago

daisykuma22 commented 2 weeks ago

Hello! Thank you for the excellent work on pLM-BLAST! I'm currently testing pLM-BLAST using the same protein FASTA file as both the query and the database. This file contains approximately 20,355 sequences. Commands used: python embeddings.py start a.fasta pLM-blastDB -embedder pt --gpu -bs 0 --asdir

python embeddings.py start a.fasta a.pt –gpu

python scripts/plmblast.py pLM-blastDB a pLM-blast_hits.csv

After running these commands, I got the output file pLM-blast_hits.csv. However, it contains only five columns, where the queryid values are numbers, and there are no columns for target ID, score, identity, or similarity.

It seems that i lost quite a lot of imformation. See the following screenshot of my output: 1731464919224

Could you help me figure it out? Thanks a lot.

Argusmocny commented 1 week ago

Hi @daisykuma22 thanks for raporting this. The screenshot you provided presents probably a content of pLM-blastDB.csv file and the pLM-blast_hits.csv should be different, but we also recorded similar issue in our test. I will take a closer look on this :)

daisykuma22 commented 1 week ago

Hi @Argusmocny Has this issue been resolved now?

Argusmocny commented 5 days ago

@daisykuma22 try to run

python scripts/plmblast.py pLM-blastDB pLM-blastDB results.csv

because it looks like your query (a.*) is the same as pLM-blastDB and these arguments can be exchanged -there is no need of calculating both of them

it this wont work try to move database to separate directory

mv pLM-blastDB /path/to/plmdb

and then run

python scripts/plmblast.py /path/to/plmdb /path/to/plmdb results.csv