Currently, our data is available only for the human proteome. More organisms will be added soon.
We are finalizing the data, and the repository structure will be changed and reorganized in the future.
Data in this repository involves multiple quality control steps for structures and entanglement. Our procedure is as follows:
Selecting High-Confidence Proteins:
Processing Crossing Residues to Handle Slipknots:
a. All Slipknots Included (_all_slipknots_*.pkl
): Slipknots identified in the crossing residues list are not removed.
b. Pure Slipknots Removed (_no_pure_slipknots_*.pkl
):
[100, 200]
and N-terminal crossing residues [20, -35]
. This means the N-terminal does not thread around the loop but only pierces the surface bounded by the loop and then returns[20, -35, 45]
, the chirality is [+,-,+]
after reduction resulting [+]
we will retain the original crossings: [20, -35, 45]
c. All Slipknots Removed (_no_slipknots_*.pkl
):
[20]
Retaining High-Confidence Crossing Residues:
We provide various datasets for each organism depending on the research question. Currently, for the human proteome, the datasets are:
data/AF_Human_all_slipknots_v1.pkl
: Includes all slipknots without checking their presence in the list of crossing residues.data/AF_Human_no_pure_slipknots_v1.pkl
: Removes only pure slipknots and does not modify mixed slipknots.data/AF_Human_no_slipknots_v1.pkl
: Does not contain any slipknots.These datasets are tailored to facilitate specific analyses related to protein entanglement and their structural properties.