jonathanking / sidechainnet

An all-atom protein structure dataset for machine learning.
BSD 3-Clause "New" or "Revised" License
330 stars 38 forks source link

check all XYZ coords in PDBbuilder #59

Closed mrjoness closed 1 year ago

mrjoness commented 1 year ago

Description

I've been using sidechainet to improve sidechain prediction for molecular dynamics data, and have been using PdbBuilder.py to output a final pdb. However, I've found that every 1000 frames or so there will be one atom missing from the outputted pdb. I've tracked the issue to the following lines in PdbBuilder.py, which checks for a (0,0,0) position by looking at the row sum. However, if the XYZ coordinates happen to sum to zero (which I've found does happen occasionally) then the function will interpret this as a missing atom.

Todos

Ensure all XYZ values are equal to zero, not just their sum.

Status

jonathanking commented 1 year ago

Thanks @mrjoness ! I appreciate your contribution 🙂 That was indeed a poor choice on my end. The next major release of SidechainNet will use nans for less ambiguous padding, but until then it's great to have this!

Would you please just add spaces around the == for consistency? Other than that I'll merge!

Also, I thought it would have been crazy for the sum of positions to add up to exactly 0 within FP error by chance... but here we are.

mrjoness commented 1 year ago

Sounds good, just updated the spaces so should be all set! I only found this bug because I had coded something similar previously that also failed for the same reason. In your case, it seems that the precision is reduced at some point to only 5 significant figures, which makes the zero sum much more likely.

jonathanking commented 1 year ago

Great! Thanks again, @mrjoness. Happy to have your contribution, and best of luck with your research.