Open jiaweiguan opened 10 months ago
We do not store the full PDB in our databases but just Cš¼ to keep the databases small. In order to get the full PDB files you would need to use our compressed Foldcomp databases, accessible through a python interface, or download it from the EBI directly.
If you want to superpose x,y,z coordinates of the target structure, you would need to:
u
and t
using the --format-output
parameter of the easy-search
workflow.x = t[0] + x * u[0][0] + y * u[0][1] + z * u[0][2]
y = t[1] + x * u[1][0] + y * u[1][1] + z * u[1][2]
z = t[2] + x * u[2][0] + y * u[2][1] + z * u[2][2]
Thank you for your help!
Neither of these databases is clustered by foldseek easy-cluster
. We only provide databases clustered by amino acid sequence. The only preclustered databases are Alphafold/UniProt50, PDB and ESMAtlas30 were clustered through MMSeqs2.
However, we did cluster the whole Alphafold/UniPort as part of our cluster work. If you want to use these structurally clustered proteins you can download the representatives through foldcomp, the db is called afdb_rep_v4
.
We only provide databases clustered by structure.
Does it mean that structural clustering is performed before creating the database?
Sorry for the confusion. I meant "We only provide databases clustered by amino acid sequence.". E.g. the UniProt50 is clustered using MMseqs2 mmseqs cluster afdb afdb50 tmp --min-seq-id 0.5 -c 0.9 --cluster-reassign 1
Got it! Thanks!
@martin-steinegger
foldseek easy-search ./1QYS.pdb ./afdb/afdb res.8m tmp --format-mode 4
When I execute the this command, I found some warnings in the log.
Can not touch 415722228266 into main memory
But I still got the search results. I don't know if this result is complete, can this warning be ignored?
You can ignore that warning. Its something that we have to fix at some point, but it doesn't affect anything.
@martin-steinegger Hi! Is ātstartā starting from 0 or 1? And ātstartāseems to be related to chains. If I want to get 'tstart' to 'tend'ļ¼I need chain.
foldseek easy-search ./query/ {database_path} tmp --format-mode 5
Result:![image](https://github.com/steineggerlab/foldseek/assets/32994356/ae700036-b59a-4350-9ce2-c1666e911b10)
From the returned results, it can be seen that only Ca. Is there any other way for me to obtain a complete PDB?