steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Generate both alignmnet tsv file and pdbs #255

Closed smilenaderi closed 3 months ago

smilenaderi commented 3 months ago

Hi I wanted to generated both pdb files and results tsv file ( alignment table). But when I use format mode 5 , alignment file is empty. How to genreate both of them? thanks

smilenaderi commented 3 months ago

@martin-steinegger

gieses commented 3 months ago

I am also looking into this, it was discussed here.

# blast out and superimposed pdbs
foldseek convertalis queryDB targetDB aln_DB result.m8 --format-mode 4
foldseek convertalis queryDB targetDB aln_DB result_pdbs --format-mode 5

# alignment
mmseqs result2msa queryDB targetDB aln_DB msa.fasta

Is probably everything you want. I am not quite sure myself yet (:.

martin-steinegger commented 3 months ago

I do do as @gieses said myself :D

smilenaderi commented 3 months ago

Thank you for your response. I was doing same but it takes twice running time. Finally I managed to run both parallel within almost the same running time.

smilenaderi commented 3 months ago
    batcmd = f'''foldseek easy-search {destination_query_path} database/pdb {output_file} tmp4 --alignment-type 1  --format-output "query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qtmscore" '''

    batcmd2 = f'''foldseek easy-search {destination_query_path} database/{database} {output_file2} tmp2 --format-mode 5 --alignment-type 1'''

    from subprocess import Popen

    commands = [batcmd,batcmd2]

    processes = [Popen(cmd, shell=True) for cmd in commands]

    for p in processes:
        p.wait()
        print(p)

running time was important @martin-steinegger @gieses

smilenaderi commented 3 months ago

Also you can do this: prog1 & prog2 && fg to run commands in parallel

martin-steinegger commented 3 months ago

@smilenaderi this is not a good idea because you actually search twice. Its quite inefficient compared to what @gieses suggested.

foldseek createdb query.pdb queryDB
foldseek search queryDB targetDB aln_DB result.m8 --format-mode 4

# blast out and superimposed pdbs
foldseek convertalis queryDB targetDB aln_DB result.m8 --format-mode 4
foldseek convertalis queryDB targetDB aln_DB result_pdbs --format-mode 5