rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
197 stars 55 forks source link

To increase the max_rotatable_bonds? #32

Open brendaferrari opened 2 years ago

brendaferrari commented 2 years ago

I would like to do fragmentation on a dataset with 145 peptides. When I tried with mmpdb it says "too many rotatable bonds". Is there any way to increase the number of max rotatable bonds permited?

adalke commented 2 years ago

The mmpdb fragment subcommand supports the --max-rotatable-bonds option to change the default. From the --help for that subcommand:

  --max-rotatable-bonds N
                        Maximum number of rotatable bonds (default: 10)

However, that's not going to be useful. mmpdb is designed for small molecules, not peptides - your system is too large.

mmpdb generates roughly C(num_rotatable_bonds, num_cuts) combinations. There are a lot of rotatable bonds in a peptide! Assuming 1 rotatable bond in the side chain and one in the backbone for each peptide gives C(145*2, 3) = `(290 289 288) / (3 * 2) = 4 million fragmentations, assuming 3 cuts.

Furthermore, if those are labeled as chiral structures then the up-enumeration process will generate 3**N-1 terms.

My initial and had-wavy thought is that peptides are so simple (assuming you have "normal" linear peptides) that you want to skip the atomic-level matching that mmpdb does and do your matching in sequence space, with pre-computed fragmentations for the 20x20/2 set of possible side-group matches.

You'll still have a combinatorial problem, but you can work entirely on linear strings, rather than going through molecular graph canonicalization.

brendaferrari commented 2 years ago

Thank you for the quick reply and for the time to write a really good explanation.

My dataset is composed by linear peptides, some of them includes protecting groups outside the natural peptides classification. My purpose is to do the fragmentation to construct a SAR Matrix. I am new to the concept of matching in sequence space, is it related to fragmentate the peptide in subunits of aminoacids?