Closed jflat06 closed 11 months ago
A design question for either of you -
Right now I kept the atom identifiers in the yaml file the same - they just use the generic atom names. This means that I have to do the work to make the unique (res-specific) atom IDs and the wildcard IDs. It also means that I have to deduce when a param crosses the residue boundary. (C-N)
One could imagine an alternative where we have a way of specifying the unique atom IDs directly (or wildcard ID, if you wanted it to remain a wildcard). This would be a bit more explicit, and you could essentially load the params directly from the DB into the hash table. I'm not sure whether this is actually better or worth the effort to re-write it, though.
@aleaverfay this is passing all tests now, if you want to take a look.
Patch coverage: 100.00%
and project coverage change: +0.10%
:tada:
Comparison is base (
056044f
) 95.03% compared to head (7375f70
) 95.13%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I think after you merge trunk into this branch and resolve the conflicts, that this PR is ready to be merged. 💪
I think after you merge trunk into this branch and resolve the conflicts, that this PR is ready to be merged. 💪
Yeah, I am having some issues after resolving the conflicts due to the patching system changes.
The first issue is that it was using the fullname to do lookups into the cartbonded DB, which was causing a keyerror since MET:nterm doesn't exist. I'm assuming we want to lookup from the basename on this? It should also probably not choke if that key doesn't exist.
The second issue is that the score is now wrong.
E x: array([[ 37.7848],
E [183.5777],
E [ 50.5842],...
E y: array([[310.2358 ],
E [194.5125 ],
E [ 50.5842 ],...
I'm not 100% sure what's going on with this yet, but I think it could be due to the atom ID tables doubling up on the terminal entries. https://github.com/uw-ipd/tmol/pull/260/files#r1322083779
Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name
Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name
Ok, yeah. Frank and I talked about this and came up with a similar conclusion. However, my new baselines don't match the ones that Frank computed with his patching-aware cartbonded using the old implementation, so we weren't 100% sure which one was correct. Frank seemed to think it was his code that was at fault, though.
Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name
@aleaverfay
Quick question -
Right now I am adding ALL atoms under the base type name as they come in. So even if you see a variant type first, it will add the atoms then, including any variant-added atoms. Any atoms removed in the variant would be added when you ran the base type.
This has the side effect that some of the variant-only atoms get tagged under the base name. This could cause collisions if you had multiple variant types that add an atom with the same name. But that shouldn't matter if you aren't adding params that reference these atoms.
This is what I have implemented currently.
If we want to ONLY tag the atoms in the base type, it gets a bit trickier.
One option is to somehow figure out which atoms are in the base type as you parse the variant, and add them under the basename.
If you don't do this, you need to ensure that:
I don't know if those are safe assumptions.
Any thoughts on this?
This PR reworks cartbonded to use the new res-centric paradigm.
The overall flow of the calculation is as follows:
To accomplish the above, several systems were added:
Lastly, I changed the database format. I have changed the old database to have an "old" suffix. It should be trivial to remove when we remove non-res-centric code.