Open alexanderbonnet opened 1 week ago
Hi,
I would like to explain this in detail, but don't have time right now. I will provide a better answer in the first weeks of July.
Right now what I can say is that comment is probably wrong. What matters is that each 'ligand atom token' can be mapped to a CA and that the masks use only CA for the losses. Any amino acid selection will take care of this.
Hope this helps (somewhat).
Best,
Patrick
Thanks for the quick answer!
You may disregard the second portion of the question, my issues were due to poor indexing on my part in one of the frame aligned point error losses. All looks good now and I am getting expected behavior during training.
For the distogram loss, it still looks to me like using any other amino acid than glycine would essentially remove the ligand from the distogram loss, as the distances considered are between CBs (except for glycine, that uses CAs, and would be compatible with setting the ligand heavy atoms to CAs).
I think I should be fine for the most part, but would love to have detailed explanations regardless if you find the time.
Thanks again, Alexander
Hi,
Great 👍
The distogram is predicted in bins mapped from the pair representation. Therefore, the amino acid type doesn't matter as long as the ground truth coordinates (CB for protein) is provided for that loss.
Hope this helps.
Best,
Patrick
Hi! First of all, thanks for making your work so readily available.
I am looking to get a PyTorch reproduction of the repository going. I have not run into problems for inference (adapting from OpenFold and converting weights), but am running into a couple of challenges at train time, and wondered if you could help me understand some implementation details.
I see in the
make_uniform
function of thepredict.py
file that a comment mentions that the amino acid type if set to glycine, but the zero index that remains actually sets the amino acid to alanine. Wouldn't this matter for thepseudo_beta_fn
and the inclusion of the ligand in the distogram loss? https://github.com/patrickbryant1/Umol/blob/f7cd2b4de09b4e7cc1b68606791dd1cc81deeebc/src/predict.py#L108In the
folding.py
for thebackbone_loss
, a"atom14_gt_exists_protein"
feature is built. I presume this contains atom masks for the protein only? As opposed to"atom14_gt_exists"
which must contain atoms for the protein and ligand.https://github.com/patrickbryant1/Umol/blob/f7cd2b4de09b4e7cc1b68606791dd1cc81deeebc/src/net/model/folding.py#L648
What about in the
sidechain_loss
?https://github.com/patrickbryant1/Umol/blob/f7cd2b4de09b4e7cc1b68606791dd1cc81deeebc/src/net/model/folding.py#L696
Thanks for your help!