Open phraenquex opened 1 year ago
@matteoferla:
Get Steph's use-case and see if there's a work-around.
Bug OpenMM community about the (1) funny numbers and (2) very slow running.
To finalise the ticket: A. RMSD needs to be sensible. (Energy not important.) B. Slow running (probably) okay. @alanbchristie check if Knitwork is well implented to exploit NextFlow. #944)
Get Steph's use-case and see if there's a work-around.
She said there was no issue in the Fragmenstein side
OpenMM
I could not get the calculated values to behave nor cut down the speed, it all works fine, but the values on testing are terrible. However, I have "simply" fixed the RDKit minimization to be aware of the protein, but as a immobile entity. (Technical detail: By include a frozen extracted protein neighbourhood, which resolves the clashes problem in all cases bar when the template ought to move (i.e. this is not flexible))
The values are a bit hairy, but the sign is somewhat consistent. (Technical detail: on account it is internal energy U, and not a hybrid(=fudged) Gibbs potential. Plus in vacuum is not great, but could be argued to be a pedantic issue)
Even with PyRosetta the interactions were the main driver and the results get filtered by bad energy and sorted by interactions primarily at least in my hands.
Bad:
Good:
@matteoferla when is your RDkit fix going into production?
(@Waztom add to agenda for Thursday.)
Unless the Squonk job has a frozen version of fragmenstein (doubtful), it should have been in production since last month (i.e. version 0.13.20)
@tdudgeon —Is this true? And if not, could the version of Fragmenstein be changed to the latest 0.13.34
. I have access only to https://gitlab.com/informaticsmatters/squonk-fragmenstein
@tdudgeon there's question for you from @matteoferla, in the question above.
If this is indeed deployed, then the ticket can move to "In production."
Currently Fragmenstein is used for 2 sets of jobs:
Both use the same base container that currently uses Fragmenstein 0.12.0 (https://github.com/InformaticsMatters/squonk2-fragmenstein/blob/main/requirements.txt#L2)
Updating this is simple and will just need the container images to be rebuilt.
But are there other impacts? If this means that the updated jobs would now use OpenMM for minimisation (rather that Pyrosetta (originally) or nothing (more recently)) do we need some extra options to allow this minimisation to be enabled and/or configured?
Thank you for the link: that is super helpful!
No, they will not use OpenMM for minimisation —rigid backbone only.
Custom options For the PyRosetta, the academic username and password are universal, so we could conceivably have an input box wherein the user provides the password which if its hash is correct runs the job. For some demo I did it that way.
Future I should say that there is another ticket, wherein I need to Squonkificate the normal Fragmenstein pipeline (bunk double or triple mergers, analog-by-catalog search, placement, clustering and ranking), but I need to change the analog-by-catalog search by copying Steph's FragmentKnitting via FragmentNetwork as oppose to use sw.docking.org. Steph is busy so I have been trying to reverse engineer it. I will talk to Rubén about how the Squonkification of Steph's code.
So Fragmenstein version should be increased to 0.13.34, without any other changes being needed?
@matteoferla your answers don't look relevant to this ticket; if you need them actioned, please create new ticket. I suspect they pertain to Brownish release.
As for "custom options": no need for a ticket, the answer is categorical: there will be no PyRosetta in Fragalysis, or any support for custom licenses, certainly not in the forseeable future - we don't have the legal backup to make this happen.
@phraenquex Apologies. I will try to be more clear in future @tdudgeon Yes, that is correct and sorry for simply pressing an emoji response in lieu of a written reply
@tdudgeon @matteoferla can I move this into Production swimlane now?
If not, what are the remaining actions?
I still need to update the fragmenstein version and deploy. This should not affect any functionality.
I should have asked: when is that happening? Is it a lot of work - or can it happen now, without slowing down green release?
@matteoferla you're on standby to test it, presumably?
@phraenquex my help stops at saying to change this line to Fragmenstein == 0.13.36
, or Fragmenstein >= 0.13.36
: the creation of an image for Squonk is a sys admin task beyond my skillset or permissions
Yes, I just need to make the change and rebuild the images. Just not had time to do so yet.
@tdudgeon that hasn't answered my question.
@matteoferla no ticket is complete until someone knowledgeable has confirmed that it "works". You're the only one that could credibly do that, at the moment. Unless you show @mwinokan how to do it.
@phraenquex: yes, as soon as @tdudgeon signals that the image rebuild is complete I will test it. From the image rebuild side, I am guessing this fully completely depends if Tim's infrastructure in Diamond is working following the shutdown earlier this month.
The fragmenstein images have now been rebuilt, and should be testable in Squonk.
Steph's fragment knitting images need to be rebuilt by @rsanchezgarc as I can't push to the XChem DockerHub repo.
Ruben, update the base container for your one to informaticsmatters/squonk2-fragmenstein-base/1.0.4
(see https://hub.docker.com/layers/informaticsmatters/squonk2-fragmenstein-base/1.0.4/images/sha256-01f71f20308a4fa2f3c3a3562557368f7ddc70b2e177334456bb0173410cc0ab?context=repo), or you can use the stable
tag if you don't mind the base image potentially shifting over time. Once done that job can be tested too.
@phraenquex No, it shouldn't be moved to production until the above has been done. But really this isn't much to do with Fragalysis staging/production/etc as it's purely Squonk related, and Squonk will use the new images as soon as they become available.
Thanks @tdudgeon. @rsanchezgarc could you give an indication of when this can happen? (If you can't do it, we'll ask @mwinokan or @Waztom to take it on.)
Correction: They are absolutely staging/production related: the categories refer to what's available to users, not to some arcane technical details about repos and stuff. They always have.
@tdudgeon @matteoferla . Shouldn't we also change the line of code in which we set up Victor or Wictor or whatever the new openmm class is called?
@rsanchezgarc —OpenVictor was a no go. OpenMM energy minimisation is very slow —it happens fully in cartesian space without internal space tricks and akin to an MD run at zero kelvin with a shaking sampler to add motion as far as I can tell. Template choice becomes critical in Wictor (without PyRosetta) as the neighbourhood is frozen. So RDKit is the only choice here.
@matteoferla .Then, how do you use it? Just as our old v = Victor()
?
@tdudgeon. I already had FROM informaticsmatters/squonk2-fragmenstein-base:stable
in the Dockerfile
@matteoferla said that we need to use Wictor()
So I just built the new image and pushed it. Is that all @tdudgeon?
If the job hasn't changed (only the fragmenstein implementation) then all that's needed is a new container image.
@tdudgeon so it should be ready now
@tdudgeon thinks it was already done, needs to check - please do for green release.
@tdudgeon can you confirm these changes were made? We will need to revisit the deployed algorithms in general in the orange release
The jobs are currently using Fragmenstein version 0.13.36 which I believe is one that no longer uses PyRosetta, but @matteoferla should confirm this.
Also, the code seems to be exclusively using Wictor
not Victor
.
Thank you for confirming @tdudgeon
@tdudgeon — I can confirm that is correct.
The current version is 1.0.6
, but there were only minor bug fixes, so 0.13
should be okay, functionally.
As you said, Fragmenstein can be run
Victor
classWictor
class. The one to use here.OpenVictor
class. Requires GPU and is way too slow for 1k+ mergersThe code implemented in Fragalysis only does a merger, which is likely not in catalogue space.
The pipeline normally consists of Fragmenstein mergers, analogue search in catalogues via NextMove Software's SmallWorld server hosted by John Irwin and analogues placed by Fragmenstein and results analysed. The middle bit needed changing to be open/independent. The other method I have used was OpenEye GraphSim, which is not open. I never did test an open analogue search —I had my eyes on faiss
library as it can be run in CPU or GPU.
Thanks @matteoferla. We will revisit the deployed algorithms in a later release (see #1454), and I will get in touch regarding the analogue search (although we quite like the 'pure' merges for CAR anyway)
@mwinokan, great, using the unaltered mergers for the in-house synthesis pipeline makes total sense: jumping into catalogue space is infuriating (eg. heteroarenes to benzenes and loss of substituents) and that is even with the fact that SW runs not on FPs but on edit distance —the Astex Fragment network is an open copycat but with fewer of Roger and John's mathemagical tricks. 🤷
@mwinokan I've updated #1454 to include that headline from the Joint Meeting, to filter RDkite poses with PoseBusters.
That may be necessary for this ticket too - especially if it's not hard to do.
@phraenquex I think it's best to address it in #1454
Sure - in which case, we're probably better off moving this ticket to #won'tfix - life being too short, and there being loads of loose ends with the overall problem anyway.
XREF matteoferla/Fragmenstein/issues/33