Open aqemia-jasmin-guven opened 3 months ago
Thanks for following up, @aqemia-jasmin-guven. @jthorton is currently on vacation, but he should be able to provide more useful answers that I gave when he returns.
Hi @aqemia-jasmin-guven thanks for trying out bespokefit!
From the paper, I understood the workflow as follows:
That's not quite the production workflow, you might be getting it a little confused with some of the examples we did in the paper which were slightly more complicated. In practice its as simple as just submitting a ligand to a running server and it will handle everything for you following the automated workflow defined here. You won't need to worry about deduplicating the fragments or making the smirks patterns this will all be done for you. I recommend starting with the quick start guide to ensure things are running as expected and then moving onto the TYK2 set.
Is the cache updated here at some point along the workflow as well?
The automated workflow will update the cache after every stage allowing the reuse of parameters and QC data, this is stored in the directory folder provided to the CLI in the redis.db
file.
Additionally, Jeff mentioned that bespokefit should internally deduplicate the fragments, however I don't think I'm seeing this behaviour. For this, I launched the executor once and then submitted a single SDF containing all the ligands.
That is correct, this is the recommended way of running, in this mode each molecule will be fragmented and for any overlapping fragments (in TYK2 there are a lot) only a single set of QC calculations should be performed on each unique fragment. Is there something indicating this is not the case?
I hope this helps, let me know if you have any other issues!
Hi @jthorton, thanks so much for getting back so quickly!
I have some follow-up questions to your reply:
That's not quite the production workflow, you might be getting it a little confused with some of the examples we did in the paper which were slightly more complicated. In practice its as simple as just submitting a ligand to a running server and it will handle everything for you following the automated workflow defined here. You won't need to worry about deduplicating the fragments or making the smirks patterns this will all be done for you.
Just to clarify, for production with multiple ligands, is it correct to input a single sdf containing all the liga
nds (which is what I have done for the TYK-2 ligands), or is it better to submit the individual ligands using separate submit commands, presumably in the same directory with the same executor?
If we're using separate commands, would it be possible to submit ligands on separate machines, e.g. with the distributed workers option from bespokefit?
I recommend starting with the quick start guide to ensure things are running as expected and then moving onto the TYK2 set.
So I actually already ran the acetaminophen example with the semi-empirical method, and didn't have problems there.
The automated workflow will update the cache after every stage allowing the reuse of parameters and QC data, this is stored in the directory folder provided to the CLI in the redis.db file.
Is it possible to include local files in this? For example, if we run the work flow for a series of ligands, and then afterwards want to run new molecules, sharing a common scaffold with the previous series, is it possible to update the local cache with the runs we have run ourselves? Is this what the --file
option in the update cache command is for?
That is correct, this is the recommended way of running, in this mode each molecule will be fragmented and for any overlapping fragments (in TYK2 there are a lot) only a single set of QC calculations should be performed on each unique fragment. Is there something indicating this is not the case?
I ran the workflow with the TYK-2 ligands from a single sdf (attached input.sdf.zip) and ended up with 98 fragments in total. Is that expected? Is there a way to actually tell if QM data for a fragment was computed from scratch or if it was taken from the database? I think the reason I got confused was that I have the outputs and QM scans for all fragments and there are some duplicates across ligands, so I just assumed that these were all computed from scratch.
Thanks again for your help so far! Please let me know if I need to clarify any of the above.
Description
Hi!
First of all, I wanted to say thanks to @j-wags for having a chat with us about OpenFF tools! We had a very productive conversation and he encouraged me to raise an issue here about some of the questions I had about BespokeFit.
I'm trying to recreate the results from the BespokeFit paper to help me understand the tool before using it in new projects. My main point of confusion is how to run the workflow for a congeneric series of ligands, such as the TYK-2 set?
From the paper, I understood the workflow as follows:
openff-fragmenter
on the whole series and save to JSON: Is this how thebespokefit_fragment_inputs.json
file from the SI of the BespokeFit paper was generated? The main question I have here is, how do we generate the target torsion SMARTS strings with just the atoms of the central bond of the torsion labelled, instead of all of them?openff-bespokefit
on just one ligand, e.g. EJM31.bespokefit
bespokefit
on custom fragmentsIs the cache updated here at some point along the workflow as well?
Essentially, I am struggling to understand the workflow given in the python scripts from the paper Zenodo.
Additionally, Jeff mentioned that
bespokefit
should internally deduplicate the fragments, however I don't think I'm seeing this behaviour. For this, I launched the executor once and then submitted a single SDF containing all the ligands.Thanks a lot in advance for your help!
Context
Software versions
Installed with
environment.yml
:conda list
?Output of
conda list