ykiiiiii / GraDe_IF

Graph Denoising Diffusion for Inverse Protein Folding(NeurIPS 2023)
50 stars 5 forks source link

Could you provide the instructions to process data? #12

Closed Gift-OYS closed 2 days ago

Gift-OYS commented 2 months ago

Could you provide the instructions to process data?

ykiiiiii commented 2 months ago

Hi

To preprocess your own data: you may use the script located at src/process_graph.sh

For retraining the model for your own data, you should specified the --train_dir argument in src/train.sh.

Gift-OYS commented 1 month ago

I have two questions about the dimensions of nodes:

Question 1

In your paper, each AA has 39-dim features, including 20-dim AA type encoder, 16-dim AA properties, and 3-dim AA positions. But in the second paragraph of Appendix C:

On top of it, the properties of AAs and AAs' local environment are described by $X^{prop}$, including the normalized crystallographic B-factor, solvent-accessible surface area (SASA), normalized surface-aware node features, dihedral angles of backbone atoms, and 3D positions.

So, the 3-dim position is calculated twice?

Question 2

How does the 16-dim properties consist of? I think it consist of 1-dim B-factor, 1-dim SASA, 5-dim surface-aware features, 6-dim angles, but the sum is not 16. In addition, for the dataset/process/test/3fkf.A.pt, it is Data(x=[139, 26], edge_index=[2, 1382], edge_attr=[1382, 93], pos=[139, 3], ss=[139, 8], edge_dist=[1400], distances=[140, 140], mu_r_norm=[139, 5]), could you explain the meaning of the dims of the nodes, and how do they correspond with 16-dim?

ykiiiiii commented 1 month ago

Hi!

Thank you for comments. Yes, it a bit unclear in the paper. Here's a revised comment that accurately describes the node features:

# Node features (39-dimensional):
# - 20 dim: Amino acid type (one-hot encoding)
# - 8 dim: Secondary structure
# - 4 dim: Dihedral angles
# - 5 dim: mu_r_norm 
# - 1 dim: SASA (Solvent Accessible Surface Area)
# - 1 dim: B-factor 

Hope it helps!