Closed Gift-OYS closed 2 days ago
Hi
To preprocess your own data: you may use the script located at src/process_graph.sh
For retraining the model for your own data, you should specified the --train_dir argument in src/train.sh.
I have two questions about the dimensions of nodes:
In your paper, each AA has 39-dim features, including 20-dim AA type encoder, 16-dim AA properties, and 3-dim AA positions. But in the second paragraph of Appendix C:
On top of it, the properties of AAs and AAs' local environment are described by $X^{prop}$, including the normalized crystallographic B-factor, solvent-accessible surface area (SASA), normalized surface-aware node features, dihedral angles of backbone atoms, and 3D positions.
So, the 3-dim position is calculated twice?
How does the 16-dim properties consist of? I think it consist of 1-dim B-factor, 1-dim SASA, 5-dim surface-aware features, 6-dim angles, but the sum is not 16. In addition, for the dataset/process/test/3fkf.A.pt
, it is Data(x=[139, 26], edge_index=[2, 1382], edge_attr=[1382, 93], pos=[139, 3], ss=[139, 8], edge_dist=[1400], distances=[140, 140], mu_r_norm=[139, 5])
, could you explain the meaning of the dims of the nodes, and how do they correspond with 16-dim
?
Hi!
Thank you for comments. Yes, it a bit unclear in the paper. Here's a revised comment that accurately describes the node features:
# Node features (39-dimensional):
# - 20 dim: Amino acid type (one-hot encoding)
# - 8 dim: Secondary structure
# - 4 dim: Dihedral angles
# - 5 dim: mu_r_norm
# - 1 dim: SASA (Solvent Accessible Surface Area)
# - 1 dim: B-factor
Hope it helps!
Could you provide the instructions to process data?