wells-wood-research / aposteriori

DNN based protein design.
MIT License
7 stars 2 forks source link

Charge Frames are either all positive or all negative #94

Closed universvm closed 5 months ago

universvm commented 7 months ago

Using 1ctf as an example:

    structure = ampal.load_pdb("/Users/leo/Documents/code/aposteriori/tests/testing_files/pdb_files/1ctf.pdb")
    for atom in structure.get_atoms():
        if not keep_sidechain_cb_atom_filter(atom):
            del atom.parent.atoms[atom.res_label]
            del atom
    positive_residue = structure[0][31]
    negative_residue = structure[0][32]
    codec = Codec.CNOCACBQ()
    # positive_frame = create_residue_frame(positive_residue, 21, 21, True, codec, True)
    negative_frame = create_residue_frame(negative_residue, 21, 21, True, codec, True)

Residue 31 is Lysine and Residue 32 is Aspartic Acid. We would expect both negative and positive values to be present, however the values are either all positive or all negative.

This is because residue property is calculated per frame rather than per atom.

universvm commented 7 months ago

Additionally this:

if (
    "Q" in codec.atomic_labels
    or "P" in codec.atomic_labels
    and res_property != 0
):

means that the Q and P channels encode all atoms (C, N, O, Ca, Cb, Q) in the final channel as having that charge, rather than just the Ca atoms.

universvm commented 7 months ago

When trained with the same dataset, the performance is lower:

Comparison_summary 2.pdf

Meaning now we have the following two scenarios:

  1. Assigning charges to all atoms improves performance
  2. Assigning charges to all atoms AND using the same charge improves performance

I will be testing option 1.