steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Extracting sequences and PDB files described in the result.m8 hits file #291

Open rakeshr10 opened 1 week ago

rakeshr10 commented 1 week ago

@milot-mirdita Is there a way to extract the full length sequences of the PDB and the corresponding PDB files of hits to a query using foldseek

milot-mirdita commented 5 days ago

You can use convertalis --format-mode 5 for superposed PDB output, and convertalis --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qseq,tseq for full length sequences added to the m8.

rakeshr10 commented 5 days ago

Thanks @milot-mirdita. Is there a way to get non-superposed PDB structures as well. Is there an option to write the 3Di sequences and 3Di alignments as well into the results file?

If I use --sort-by-structure-bits 0 options without Calpha info like that for ProstT5 predictions. How are the bit scores and E-values calculated in this case without structural info, are they just based on AA and 3DI sequences?