schrodinger / coordgenlibs

Schrodinger-developed 2D Coordinate Generation
BSD 3-Clause "New" or "Revised" License
42 stars 28 forks source link

Really slow performance with some macrocycles #118

Open bjonnh-work opened 1 year ago

bjonnh-work commented 1 year ago

We found an issue when using coordgen with rdkit. When activating coordgen, it is extremely slow to work with such chiral macrocycles. Removing the chirality from the atoms makes it fast again.

(roughly 100 times slower for the given molecule).

from rdkit.Chem import rdDepictor
from rdkit import Chem

### This is slow
rdDepictor.SetPreferCoordGen(True)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))

### This is fast
rdDepictor.SetPreferCoordGen(False)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))

This is a crossposted issue with rdkit https://github.com/rdkit/rdkit/issues/5813

d-b-w commented 1 year ago

Thanks for the report! this is very interesting

bjonnh-work commented 1 year ago

I'm still trying to figure out the impact of chirality, it seems that in some cases it may be even slower without chirality.

bjonnh-work commented 1 year ago

Ran some better benchmarks, for that molecule at least, it seems the chirality has no impact.

bjonnh-work commented 1 year ago

However, the more flexible it is the more time it takes. This takes twice the time as the previous molecule: CC1CCCCCCCCCOCCNCCCCCCCCCCOCCNC(C)CCCCCCCCC(=O)OCCN1

bjonnh-work commented 1 year ago

CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2m on my machine CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2s

I think I'll throw a profiler at that

bjonnh-work commented 1 year ago

Minimal example that doesn't require rdkit

BOOST_AUTO_TEST_CASE(SlowMacrocycle)
{
    auto mol = "CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1"_smiles;
    BOOST_TEST(mol->getBonds()[0]->getBondOrder() == 1);
    sketcherMinimizer minimizer;
    minimizer.initialize(mol.get());
    minimizer.runGenerateCoordinates();
    const auto& atoms = minimizer.getAtoms();
    sketcherMinimizerAtom* center = atoms.at(0);
    BOOST_REQUIRE_EQUAL(center->getAtomicNumber(), 6);
}
tadhurst-cdd commented 1 year ago

The slowdown occurs in file: .../coordgen/CoordgenMacrocycleBuilder.cpp At line 682: if (checkedMacrocycles > MAX_MACROCYCLES) { break; } MAX_MACROCYCLES is set to 40, and it takes a long time to get there for the bad mol. OR, the acceptableScore calculated and checked just above that could be the issue.

  It is calculated as: numberOfAtoms * SUBSTITUTED_ATOM_RESTRAINT / 2
  and   SUBSTITUTED_ATOM_RESTRAINT is 10

  Just FYI
d-b-w commented 1 year ago

Ok, the general slowness on macrocycles is a known issue. We're also tracking performance issues here: https://github.com/schrodinger/coordgenlibs/issues/39 and in Schrödinger's internal bug tracker. We have efforts underway to sidestep this, so we'll probably incorporate these as test cases. That project is somewhat long-term, though.

Your team is welcome to submit a patch if you have suggestions, of course!