rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
206 stars 55 forks source link

Can we specify the environmental radius when generating the mmpdb? #43

Closed chengthefang closed 2 years ago

chengthefang commented 2 years ago

Hi all,

I am aware of that we can adjust the max-radius parameter to set the maximum environmental radius to be indexed in MMPDB. But I wonder if there is a way that we can only index the database at a specific radius. For example, could we just generate mmpdb at the radius =3 specifically?

Thanks! Cheng

KramerChristian commented 2 years ago

Hi Cheng,

this is doable, but not on the command line. You'd have to hack the code. Unfortunately I do not have the time to look into the code and test where exactly you'd have to change to not miss a point. But it has been done before, and if I remember correctly it was not too hard.

Bests, Christian

chengthefang commented 2 years ago

Hi Cheng,

this is doable, but not on the command line. You'd have to hack the code. Unfortunately I do not have the time to look into the code and test where exactly you'd have to change to not miss a point. But it has been done before, and if I remember correctly it was not too hard.

Bests, Christian

Hi Christian,

Thank you for your response. I think I can dig into the codes and modify it accordingly. I will close the ticket.

Best, Cheng

adalke commented 2 years ago

Perhaps I'm missing something, but if you set --min-radius 3 and --max-radius 3 ... does that give you want you want?

chengthefang commented 2 years ago

Perhaps I'm missing something, but if you set --min-radius 3 and --max-radius 3 ... does that give you want you want?

I think it should be a possible resolution. But I couldn't find "--min-radius" argument in the mmpdb index function. When I do mmpdb index --help, I can only see "--max-radius N" option. I wonder if you are referring to a different version? I am using the latest one downloaded from this repo. Thanks! Cheng

usage: mmpdb index [-h] [--min-variable-heavies N] [--max-variable-heavies N] [--min-variable-ratio FLT] [--max-variable-ratio FLT] [--max-heavies-transf N] [--max-frac-trans FLT] [--max-radius N] [--symmetric] [--smallest-transformation-only] [--properties FILENAME] [--output FILENAME] [--out FORMAT] [--title TITLE] [--memory] [fragment_filename]

adalke commented 2 years ago

Indeed, that was my error. The --min-radius implementation is in the mmpdb 3 version I'm developing at https://github.com/adalke/mmpdb/ . Try that one out?

chengthefang commented 2 years ago

Indeed, that was my error. The --min-radius implementation is in the mmpdb 3 version I'm developing at https://github.com/adalke/mmpdb/ . Try that one out?

Thank you for letting me know there is another developing package. But when I go to the link you sent to me, it still showed it is mmpdb 2.2-dev1 version other than mmpdb 3. I also checked "mmpdb/mmpdblib/config.py" in your repo but didn't find "min-radius" argument. I might miss something. Would you mind directing me to the codes where "min-radius" is defined? That would be very helpful.

Thanks! Cheng

adalke commented 2 years ago

Another apology! It's in the "v3-dev" branch, at https://github.com/adalke/mmpdb/tree/v3-dev .

% mmpdb --version
mmpdb, version 3.0a1
% mmpdb index --help
Usage: mmpdb index [OPTIONS] FILENAME

  Index fragments and find matched molecular pairs

  FILENAME: the name of the fragdb file containing the fragments to index

Options:
  --smallest-transformation-only  Ignore all transformations that can be
                                  reduced to smaller fragments
  -s, --symmetric                 Output symmetrically equivalent MMPs, i.e
                                  output both cmpd1,cmpd2, SMIRKS:A>>B and
                                  cmpd2,cmpd1, SMIRKS:B>>A
  --max-radius N                  Maximum Environment Radius to be indexed in
                                  the MMPDB database
  --min-radius N                  Minimum Environment Radius to be indexed in
                                  the MMPDB database
 ...
chengthefang commented 2 years ago

https://github.com/adalke/mmpdb/tree/v3-dev

Thank you, Andrew! Awesome, the v3 also has some other features. Looking forward to the future development. I am going to close the ticket.

Best, Cheng

chengthefang commented 2 years ago

Another apology! It's in the "v3-dev" branch, at https://github.com/adalke/mmpdb/tree/v3-dev .

% mmpdb --version
mmpdb, version 3.0a1
% mmpdb index --help
Usage: mmpdb index [OPTIONS] FILENAME

  Index fragments and find matched molecular pairs

  FILENAME: the name of the fragdb file containing the fragments to index

Options:
  --smallest-transformation-only  Ignore all transformations that can be
                                  reduced to smaller fragments
  -s, --symmetric                 Output symmetrically equivalent MMPs, i.e
                                  output both cmpd1,cmpd2, SMIRKS:A>>B and
                                  cmpd2,cmpd1, SMIRKS:B>>A
  --max-radius N                  Maximum Environment Radius to be indexed in
                                  the MMPDB database
  --min-radius N                  Minimum Environment Radius to be indexed in
                                  the MMPDB database
 ...

Hi Andrew,

I reopened the ticket since I can't make the v3 version work. The reason is that the v3-dev doesn't contain some key files that exist in the master branch. I wonder if I missed anything when using it. Below is how I install them. For example, there is no "mmpdb" in the v3-dev branch. There are also missing files in "mmpdblib" folder as well.

% git clone https://github.com/adalke/mmpdb.git
% cd mmpdb
% ls
CHANGELOG.md    MANIFEST    PKG-INFO    **mmpdb**       setup.cfg   tests
LICENSE.txt MANIFEST.in README.md   mmpdblib    setup.py
% git branch --show-current
master

% git checkout v3-dev
Branch 'v3-dev' set up to track remote branch 'v3-dev' from 'origin'.
Switched to a new branch 'v3-dev'
 % ls
CHANGELOG.md    MANIFEST    PKG-INFO    mmpdblib    setup.cfg   tests
LICENSE.txt MANIFEST.in README.md   pyproject.toml  setup.py

Would you mind giving me some hints how to use v3-dev?

Thanks! Cheng

adalke commented 2 years ago

You need to install the package.

To install, do:

python3 -m pip install --user .

If you are in a virtual environment (which is the suggested best practice for modern Python use) then this will install mmpdb to your virtual environment.

If you are not in a virtual environment then this will install mmpdb for use by your standard Python installation, in a way which does not modify the system-wide settings.

pip install is such a common step in working with Python packages that I forgot to include it in the README.

What happened?

The 2.x version of mmpdb included its own command-line driver, named mmpdb. The 3.0 version switched to have pip install the default command-line driver, which only gets added to your shell's path in the installation step.

In addition, mmpdb uses the peewee package. In version 2.x I included a copy of the peewee package. In version 3.0 I switched to listing it as a dependency. The pip install step will automatically download and install that package from the PyPI package server.

I did this because I also switched to using the click package for command-line processing, rather than using Python's built-in argparse package, and I didn't want to also include a copy of click in the mmpdb distribution. Instead, the pip install step will download and install click, if not currently installed.

chengthefang commented 2 years ago

You need to install the package.

To install, do:

python3 -m pip install --user .

If you are in a virtual environment (which is the suggested best practice for modern Python use) then this will install mmpdb to your virtual environment.

If you are not in a virtual environment then this will install mmpdb for use by your standard Python installation, in a way which does not modify the system-wide settings.

pip install is such a common step in working with Python packages that I forgot to include it in the README.

What happened?

The 2.x version of mmpdb included its own command-line driver, named mmpdb. The 3.0 version switched to have pip install the default command-line driver, which only gets added to your shell's path in the installation step.

In addition, mmpdb uses the peewee package. In version 2.x I included a copy of the peewee package. In version 3.0 I switched to listing it as a dependency. The pip install step will automatically download and install that package from the PyPI package server.

I did this because I also switched to using the click package for command-line processing, rather than using Python's built-in argparse package, and I didn't want to also include a copy of click in the mmpdb distribution. Instead, the pip install step will download and install click, if not currently installed.

Thank you, Andrew! It has been successfully installed.