Closed huhlim closed 1 year ago
Hi @huhlim, thank you for letting us know!
For inference runs, nxyz_data list is supposed to be empty because we don't have all-atom xyz information.
There was a typo in the README (cd script -> cd scripts
) and the example file (for topology) PED00055.pdb was not in the ./data directory, and I think this was the issue. We fixed the README and added the file, could you pull the data file and try again?
Hi @SoojungYang, thank you for your update!
I run the inference commands again but got an almost empty pkl file, which contained only array([], shape=(10, 0), dtype=float64)
.
In my opinion, the problem is the length of testset
variable is zero.
https://github.com/learningmatter-mit/GenZProt/blob/81d5512681a953a2ce967e1258693f4c7c4f4ed0/scripts/inference.py#L152-L154
It is because the build_dataset
function calls build_cg_dataset
https://github.com/learningmatter-mit/GenZProt/blob/81d5512681a953a2ce967e1258693f4c7c4f4ed0/scripts/inference.py#L43-L57
and
build_cg_dataset
creates a pytorch dataset
https://github.com/learningmatter-mit/GenZProt/blob/81d5512681a953a2ce967e1258693f4c7c4f4ed0/GenZProt/datasets.py#L656-L676
Unfortunately, the "len" method of CGDataset
is defined by the length of nxyz_data list.
https://github.com/learningmatter-mit/GenZProt/blob/81d5512681a953a2ce967e1258693f4c7c4f4ed0/GenZProt/data.py#L90-L101
I attempted to solve the issue but could not. If I misunderstood, please let me know.
Hi @huhlim, you are right, there was an issue regarding data loading. Fixed the data loading (added functions CG_dataset_inf
and CG_collate_inf
) and it should work properly now. I also updated the final inference output generation part to provide both a numpy array and a pdb file for a better readability (please check updated README). Hope the problem is solved now! Again, thank you for your inputs and I'm sorry for the confusion.
Thanks for the fix! It is working fine after correcting a typo, traj_to_into --> traj_to_info
https://github.com/learningmatter-mit/GenZProt/blob/9bb8e57b256e6fee9854a9548b5d3f90b23f214b/scripts/inference.py#L123
Sorry for reopening the issue. The inference script generated outputs. However, both N and C-termini residues were excluded from the generation.
No worries! N and C-termini residues are truncated because our algorithm requires i-1th and i+1th C_alpha positions to locate the atoms of the ith residue. The topology of the generated pdb file is also truncated accordingly. I added the clarification on README. You can also refer to Appendix D.5. of the preprint. In future updates, we plan to include backmapping of the terminal residues.
Thank you for the clarification!
The inference script is not working as its "testset" length is zero. It is because "nxyz_data" list in "build_cg_dataset" function is empty.