labstructbioinf / pLM-BLAST

Detection of remote homology by comparison of protein language model representations
https://toolkit.tuebingen.mpg.de/tools/plmblast
MIT License
37 stars 5 forks source link

Return results of pre-screening only #47

Closed staszekdh closed 1 month ago

staszekdh commented 1 month ago

Would it be possible to implement an option/flag in pLM-BLAST to return the pre-screening scores without performing the subsequent time-consuming alignments?

Argusmocny commented 1 month ago

I have added an --only-scan flag to scripts/plmblast.py it will force to save prescreeining results to ouput argument. script will exit after that. The format of the file is as below:

{
'queryid1 : {
        { file: targetfile1, score: scoreval1, condition: True }
               { file: targetfile2, score: scoreval2, condition: False }
}, queryid2 : {
         { file: targetfile1, score: scoreval1, condition: True }
...
} 

Where score is a pre-screening value and condition checks whether quantile threshold criteria is met. Please give me a feedback whether this meets your expectations.

NikolasMumm commented 1 month ago

I have just tested the updated version. But if I run the script like python code/tools/pLM-BLAST/scripts/plmblast.py data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/test.json -win 15 -span 25 -gap_ext 0.5 -bfactor 2 -sigma_factor 2.0 -cpc 70 -workers 16 --only-scan or with no extra arguments like python code/tools/pLM-BLAST/scripts/plmblast.py data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/test.json --only-scan I only get results like:

{
    "0": {
        "0": "data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db/0.emb",
        "1": "data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db/1.emb",
        "2": "data/surfaceome/results/GCF_000321355.1_PcynB_1.0/Domains/domains_db/2.emb",
...
}
NikolasMumm commented 1 month ago

Now everything is working fine, thank you very much