Closed vuqv closed 1 month ago
np.mean(pLDDT of i, j and 3 residues along primary structure) > 60
np.mean(pLDDT of k and 3 residues along primary structure) > 60
np.mean(pLDDT of all residues within 4.5 Angstrom of heavy atom of k) > 60
Quyen, I believe your criteria are too strict.
For each entanglement, we have the result [(i, j, [k1, k2, ..., kn])]. Since i and j are pairs of residues in close contact, they are single values. However, crossing events can occur multiple times, so we can have multiple k values.
np.mean(pLDDT of region i-j) >= 70
np.mean(k ± 3) >= 70
pLDDT per-residue of [i, j, *[k]] >= 70
I believe these criteria are too strict. Specifically,
[i, j]
, and [k]
should be separated. i and j
must have per-residue pLDDT > 70
because if any of those residues forming contact is not confident, it will significantly affect whether the loop is closed or not. However, for crossing residues, many crossing residues might be present. Rejecting an entanglement if any crossing residue is of low quality will remove many potential entanglements.For the list of crossing residues, remove only those residues with low quality. If the remaining list is empty, then remove that entanglement. If the remaining list is not empty, it suggests that the crossing event can still be real.
per-residue pLDDT of i >= 70
per-residue pLDDT of j >= 70
[plddt(x) for x in list_crossing_residues if plddt(x) > 70] is not empty
By implementing these changes, we can maintain the integrity of the evaluation while allowing for more realistic entanglement detection.
As the results, we look at randomly selected 40 entanglements, 10 for each category:
For Ed/Ian criteria, the accuracy is ~52 % while Quyen criteria gives 80%
This has been solved! Good job Quyen and Ian
After control for quality of overall structure quality. It is needed to control for entanglement quality. The reason for that is for example, sequence with 1000's residues will have a very high quality overall, but some region with low quality and AF will add a disodered region for that. This can be a loop, then GLN algorithm will identified it is entanglement.
This is the second step of control for quality of entanglement, after #4 .
How to to this? There are two set of criteria that Quyen and Ed/Ian do not agree with each other, then test both to count for the False rate in each set of criteria. Which criteria gives lower false rate is better.