precomputed pdb lookup and sequence don't line up

steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.

GNU General Public License v3.0

693 stars 91 forks source link

The database entries are not stored in order. They are stored in our internal MMseqs2 database format: https://github.com/soedinglab/MMseqs2/wiki#mmseqs2-database-format

The lookup file points to a database key (first column of the .lookup file), which points to the .index (again first column). In the index you can lookup the byte offset (second column) that points to the data file.

The data file is a special issue for the PDB, since we ship it as a clustered database. The full PDB data is split across two seperate files pdb_seq.0 and pdb_seq.1, the former contains only the cluster representatives and the latter all others.

I would recommend to do database manipulations with the various Foldseek/MMseqs2 commands.

steineggerlab / foldseek

precomputed pdb lookup and sequence don't line up #258