Replace PyRosetta with OpenMM in Fragmenstein

matteoferla commented 1 year ago

XREF matteoferla/Fragmenstein/issues/33

phraenquex commented 11 months ago

@matteoferla:

Get Steph's use-case and see if there's a work-around.

Bug OpenMM community about the (1) funny numbers and (2) very slow running.

To finalise the ticket: A. RMSD needs to be sensible. (Energy not important.) B. Slow running (probably) okay. @alanbchristie check if Knitwork is well implented to exploit NextFlow. #944)

matteoferla commented 10 months ago

Get Steph's use-case and see if there's a work-around.

She said there was no issue in the Fragmenstein side

OpenMM

I could not get the calculated values to behave nor cut down the speed, it all works fine, but the values on testing are terrible. However, I have "simply" fixed the RDKit minimization to be aware of the protein, but as a immobile entity. (Technical detail: By include a frozen extracted protein neighbourhood, which resolves the clashes problem in all cases bar when the template ought to move (i.e. this is not flexible))

The values are a bit hairy, but the sign is somewhat consistent. (Technical detail: on account it is internal energy U, and not a hybrid(=fudged) Gibbs potential. Plus in vacuum is not great, but could be argued to be a pedantic issue)

Even with PyRosetta the interactions were the main driver and the results get filtered by bad energy and sorted by interactions primarily at least in my hands.

Bad:

fixed template
in vacuo

Good:

takes a fraction of a second
it works

phraenquex commented 10 months ago

@matteoferla when is your RDkit fix going into production?

(@Waztom add to agenda for Thursday.)

matteoferla commented 10 months ago

Unless the Squonk job has a frozen version of fragmenstein (doubtful), it should have been in production since last month (i.e. version 0.13.20)

@tdudgeon —Is this true? And if not, could the version of Fragmenstein be changed to the latest 0.13.34. I have access only to https://gitlab.com/informaticsmatters/squonk-fragmenstein

phraenquex commented 10 months ago

@tdudgeon there's question for you from @matteoferla, in the question above.

If this is indeed deployed, then the ticket can move to "In production."

tdudgeon commented 10 months ago

Currently Fragmenstein is used for 2 sets of jobs:

original fragmenstein jobs (https://github.com/InformaticsMatters/squonk2-fragmenstein)
Steph/Rubens fragment merges (https://github.com/stephwills/fragment_network_merges)

Both use the same base container that currently uses Fragmenstein 0.12.0 (https://github.com/InformaticsMatters/squonk2-fragmenstein/blob/main/requirements.txt#L2)

Updating this is simple and will just need the container images to be rebuilt.

But are there other impacts? If this means that the updated jobs would now use OpenMM for minimisation (rather that Pyrosetta (originally) or nothing (more recently)) do we need some extra options to allow this minimisation to be enabled and/or configured?

matteoferla commented 10 months ago

Thank you for the link: that is super helpful!

No, they will not use OpenMM for minimisation —rigid backbone only.

Custom options For the PyRosetta, the academic username and password are universal, so we could conceivably have an input box wherein the user provides the password which if its hash is correct runs the job. For some demo I did it that way.

Future I should say that there is another ticket, wherein I need to Squonkificate the normal Fragmenstein pipeline (bunk double or triple mergers, analog-by-catalog search, placement, clustering and ranking), but I need to change the analog-by-catalog search by copying Steph's FragmentKnitting via FragmentNetwork as oppose to use sw.docking.org. Steph is busy so I have been trying to reverse engineer it. I will talk to Rubén about how the Squonkification of Steph's code.

tdudgeon commented 10 months ago

So Fragmenstein version should be increased to 0.13.34, without any other changes being needed?

phraenquex commented 10 months ago

@matteoferla your answers don't look relevant to this ticket; if you need them actioned, please create new ticket. I suspect they pertain to Brownish release.

As for "custom options": no need for a ticket, the answer is categorical: there will be no PyRosetta in Fragalysis, or any support for custom licenses, certainly not in the forseeable future - we don't have the legal backup to make this happen.

matteoferla commented 10 months ago

@phraenquex Apologies. I will try to be more clear in future @tdudgeon Yes, that is correct and sorry for simply pressing an emoji response in lieu of a written reply

phraenquex commented 9 months ago

@tdudgeon @matteoferla can I move this into Production swimlane now?

If not, what are the remaining actions?

tdudgeon commented 9 months ago

I still need to update the fragmenstein version and deploy. This should not affect any functionality.

phraenquex commented 9 months ago

I should have asked: when is that happening? Is it a lot of work - or can it happen now, without slowing down green release?

@matteoferla you're on standby to test it, presumably?

matteoferla commented 9 months ago

@phraenquex my help stops at saying to change this line to Fragmenstein == 0.13.36, or Fragmenstein >= 0.13.36: the creation of an image for Squonk is a sys admin task beyond my skillset or permissions

tdudgeon commented 9 months ago

Yes, I just need to make the change and rebuild the images. Just not had time to do so yet.

phraenquex commented 9 months ago

@tdudgeon that hasn't answered my question.

@matteoferla no ticket is complete until someone knowledgeable has confirmed that it "works". You're the only one that could credibly do that, at the moment. Unless you show @mwinokan how to do it.

matteoferla commented 9 months ago

@phraenquex: yes, as soon as @tdudgeon signals that the image rebuild is complete I will test it. From the image rebuild side, I am guessing this fully completely depends if Tim's infrastructure in Diamond is working following the shutdown earlier this month.

tdudgeon commented 9 months ago

The fragmenstein images have now been rebuilt, and should be testable in Squonk.

Steph's fragment knitting images need to be rebuilt by @rsanchezgarc as I can't push to the XChem DockerHub repo. Ruben, update the base container for your one to informaticsmatters/squonk2-fragmenstein-base/1.0.4 (see https://hub.docker.com/layers/informaticsmatters/squonk2-fragmenstein-base/1.0.4/images/sha256-01f71f20308a4fa2f3c3a3562557368f7ddc70b2e177334456bb0173410cc0ab?context=repo), or you can use the stable tag if you don't mind the base image potentially shifting over time. Once done that job can be tested too.

@phraenquex No, it shouldn't be moved to production until the above has been done. But really this isn't much to do with Fragalysis staging/production/etc as it's purely Squonk related, and Squonk will use the new images as soon as they become available.

phraenquex commented 9 months ago

Thanks @tdudgeon. @rsanchezgarc could you give an indication of when this can happen? (If you can't do it, we'll ask @mwinokan or @Waztom to take it on.)

Correction: They are absolutely staging/production related: the categories refer to what's available to users, not to some arcane technical details about repos and stuff. They always have.

rsanchezgarc commented 9 months ago

@tdudgeon @matteoferla . Shouldn't we also change the line of code in which we set up Victor or Wictor or whatever the new openmm class is called?

matteoferla commented 9 months ago

@rsanchezgarc —OpenVictor was a no go. OpenMM energy minimisation is very slow —it happens fully in cartesian space without internal space tricks and akin to an MD run at zero kelvin with a shaking sampler to add motion as far as I can tell. Template choice becomes critical in Wictor (without PyRosetta) as the neighbourhood is frozen. So RDKit is the only choice here.

rsanchezgarc commented 9 months ago

@matteoferla .Then, how do you use it? Just as our old v = Victor()?

rsanchezgarc commented 9 months ago

@tdudgeon. I already had FROM informaticsmatters/squonk2-fragmenstein-base:stable in the Dockerfile @matteoferla said that we need to use Wictor() So I just built the new image and pushed it. Is that all @tdudgeon?

tdudgeon commented 9 months ago

If the job hasn't changed (only the fragmenstein implementation) then all that's needed is a new container image.

rsanchezgarc commented 9 months ago

@tdudgeon so it should be ready now

phraenquex commented 3 months ago

@tdudgeon thinks it was already done, needs to check - please do for green release.

mwinokan commented 3 months ago

@tdudgeon can you confirm these changes were made? We will need to revisit the deployed algorithms in general in the orange release

tdudgeon commented 3 months ago

The jobs are currently using Fragmenstein version 0.13.36 which I believe is one that no longer uses PyRosetta, but @matteoferla should confirm this. Also, the code seems to be exclusively using Wictor not Victor.

mwinokan commented 3 months ago

Thank you for confirming @tdudgeon

matteoferla commented 3 months ago

@tdudgeon — I can confirm that is correct. The current version is 1.0.6, but there were only minor bug fixes, so 0.13 should be okay, functionally.

As you said, Fragmenstein can be run

default (PyRosetta) — Victor class
RDKit only with PyRosetta —Wictor class. The one to use here.
openMM — OpenVictor class. Requires GPU and is way too slow for 1k+ mergers

The code implemented in Fragalysis only does a merger, which is likely not in catalogue space. The pipeline normally consists of Fragmenstein mergers, analogue search in catalogues via NextMove Software's SmallWorld server hosted by John Irwin and analogues placed by Fragmenstein and results analysed. The middle bit needed changing to be open/independent. The other method I have used was OpenEye GraphSim, which is not open. I never did test an open analogue search —I had my eyes on faiss library as it can be run in CPU or GPU.

mwinokan commented 3 months ago

Thanks @matteoferla. We will revisit the deployed algorithms in a later release (see #1454), and I will get in touch regarding the analogue search (although we quite like the 'pure' merges for CAR anyway)

matteoferla commented 3 months ago

@mwinokan, great, using the unaltered mergers for the in-house synthesis pipeline makes total sense: jumping into catalogue space is infuriating (eg. heteroarenes to benzenes and loss of substituents) and that is even with the fact that SW runs not on FPs but on edit distance —the Astex Fragment network is an open copycat but with fewer of Roger and John's mathemagical tricks. 🤷

phraenquex commented 3 months ago

@mwinokan I've updated #1454 to include that headline from the Joint Meeting, to filter RDkite poses with PoseBusters.

That may be necessary for this ticket too - especially if it's not hard to do.

mwinokan commented 3 months ago

@phraenquex I think it's best to address it in #1454

phraenquex commented 3 months ago

Sure - in which case, we're probably better off moving this ticket to #won'tfix - life being too short, and there being loads of loose ends with the overall problem anyway.

m2ms / fragalysis-frontend

Replace PyRosetta with OpenMM in Fragmenstein #1078