rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
198 stars 55 forks source link

--min-heavies-per-const-frag 3 option looses some transformations #9

Closed ValeryPolyakov closed 5 years ago

ValeryPolyakov commented 5 years ago

When using --min-heavies-per-const-frag 3 option during the fragmentation stage, I noticed that I am loosing the following transformation: [:1]O[:2] to [:1]C([:2])N in which one of the Rs is a simple methyl group. Is it possible to somehow loosing this transformation by playing with any option on during the indexing step.

KramerChristian commented 5 years ago

Hi Valery,

if you use --min-heavies-per-const-frag 3, the [:1]O[:2] >> [:1]C([:2])N transformation will not be indexed any more, if one of the rests is just a simple methyl, because that fragmentation will not be created any more. So there is no way to get back exactly this transformation during indexing.

However, for all pairs of molecules which previously had that double-cut transformation, the single-cut transformation [:1]OC >> [:1]C(C)N will still be indexed and written to the database/output.

The idea for the --min-heavies-per-frag option was that there can be a lot of use cases where having one of the two (or even more in other cases) transformations in the DB is sufficient.

Do you have a specific need to use the double-cut transformation rather than the single-cut transformation?

Bests, Christian

ValeryPolyakov commented 5 years ago

Thanks Christian,

Do you know why I did not see a single-cut transformation in the output? Is there a specific option during indexing to recover it?

Valery

On Tue, Mar 12, 2019 at 1:18 AM Christian Kramer notifications@github.com wrote:

Hi Valery,

if you use --min-heavies-per-const-frag 3, the [:1]O[:2] >> [:1]C([:2])N transformation will not be indexed any more, if one of the rests is just a simple methyl, because that fragmentation will not be created any more. So there is no way to get back exactly this transformation during indexing.

However, for all pairs of molecules which previously had that double-cut transformation, the single-cut transformation [:1]OC >> [:1]C(C)N will still be indexed and written to the database/output.

The idea for the --min-heavies-per-frag option was that there can be a lot of use cases where having one of the two (or even more in other cases) transformations in the DB is sufficient.

Do you have a specific need to use the double-cut transformation rather than the single-cut transformation?

Bests, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-471899780, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_mi23IIbK2WG4SZ14wpZWrL1zxRZks5vV2LQgaJpZM4bpQL2 .

KramerChristian commented 5 years ago

Hi Valery,

I do not think that there is a specific option about to remove/ recover this transformation during indexing. If you send me two example input SMILES that have the problem, I can try to figure out what is going on here.

Bests, Christian

ValeryPolyakov commented 5 years ago

Thanks. I need to think about it. Obviously, I cannot send the actual compound...

On Tue, Mar 12, 2019 at 5:28 AM Christian Kramer notifications@github.com wrote:

Hi Valery,

I do not think that there is a specific option about to remove/ recover this transformation during indexing. If you send me two example input SMILES that have the problem, I can try to figure out what is going on here.

Bests, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-471979794, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_rE4Mw-RbeWJX7C9fqsAx7KTvmpAks5vV51lgaJpZM4bpQL2 .

ValeryPolyakov commented 5 years ago

Hi Christian,

I which database table should I find the single and double cut smiles like this: [:1]O[:2] or [:1]OC?

Valery

Valery

On Tue, Mar 12, 2019 at 5:44 AM Valery Polyakov valery.polyakov@gmail.com wrote:

Thanks. I need to think about it. Obviously, I cannot send the actual compound...

On Tue, Mar 12, 2019 at 5:28 AM Christian Kramer notifications@github.com wrote:

Hi Valery,

I do not think that there is a specific option about to remove/ recover this transformation during indexing. If you send me two example input SMILES that have the problem, I can try to figure out what is going on here.

Bests, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-471979794, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_rE4Mw-RbeWJX7C9fqsAx7KTvmpAks5vV51lgaJpZM4bpQL2 .

KramerChristian commented 5 years ago

Hi Valery,

that depends on what you want to do exactly. The database has a table named 'rule_smiles' which contains all the SMILES of the fragments. In SQLite3, you can check whether a given SMILES is in that table by

"select from rule_smiles where smiles like '[:1]OC';"

If you want to find all transformations where that SMILES is involved, you have to query the table named 'rule' with the ID you get from the first query. If you are interested in all transformations + environments where that SMILES is used, you have to query the 'rule_environment' table with the id you get from the query in the 'rule' table.

If you query within the DB directly, I recommend to build the DB using the --symmetric option. Otherwise, you have to use the ID for your smiles in both LHS and RHS columns.

Bests, Christian

ValeryPolyakov commented 5 years ago

Hi Christian,

Thanks. The query works. It is a little slow, though. Is the --symmertic option result is DB size increase?

Valery

On Wed, Mar 13, 2019 at 3:14 AM Christian Kramer notifications@github.com wrote:

Hi Valery,

that depends on what you want to do exactly. The database has a table named 'rule_smiles' which contains all the SMILES of the fragments. In SQLite3, you can check whether a given SMILES is in that table by

"select from rule_smiles where smiles like '[:1]OC';"

If you want to find all transformations where that SMILES is involved, you have to query the table named 'rule' with the ID you get from the first query. If you are interested in all transformations + environments where that SMILES is used, you have to query the 'rule_environment' table with the id you get from the query in the 'rule' table.

If you query within the DB directly, I recommend to build the DB using the --symmetric option. Otherwise, you have to use the ID for your smiles in both LHS and RHS columns.

Bests, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-472360993, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_tQ8VZebR-F3D_ew9kKuU5tiJ8Rrks5vWM-ggaJpZM4bpQL2 .

KramerChristian commented 5 years ago

Hi Valery,

yes, it results in an almost 2 fold increase of the DB size.

Christian

ValeryPolyakov commented 5 years ago

Hi Christian,

Can you tell me what SQL commands go into the following query:

python mmpdb predict --smiles "smiles1" --reference "smiles2" --property "propName" --save-details --prefix noOptions master_full.mmpdb > noOptions.txt

obviously, there are real smiles under smiles1 and smiles2.

Thanks a lot,

Valery

On Thu, Mar 14, 2019 at 2:44 AM Christian Kramer notifications@github.com wrote:

Hi Valery,

yes, it results in an almost 2 fold increase of the DB size.

Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-472776183, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_i_lRmUnq79V4nfFwJT0aXJMnzEWks5vWhnlgaJpZM4bpQL2 .

adalke commented 5 years ago

In mmpdblib/schema.py change the method MMPDatabase.execute from

        if 0:
            import time
            print("EXECUTE")
            print(sql)
            print(repr(args))

to

        if 1:
            import time
            print("EXECUTE")
            print(sql)
            print(repr(args))

That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters.

The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries.

ValeryPolyakov commented 5 years ago

Thanks. I will try that.

On Tue, Mar 19, 2019, 1:45 AM Andrew Dalke notifications@github.com wrote:

In mmpdblib/schema.py change the method MMPDatabase.execute from

    if 0:
        import time
        print("EXECUTE")
        print(sql)
        print(repr(args))

to

    if 1:
        import time
        print("EXECUTE")
        print(sql)
        print(repr(args))

That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters.

The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-474248884, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_iV1fKzS1mKmcVzAXHRkEUTZbuHkks5vYKObgaJpZM4bpQL2 .

ValeryPolyakov commented 5 years ago

Hi Andrew,

I did that, but the sql statements are not being printed...

Valery R. Polyakov

On Tue, Mar 19, 2019 at 1:45 AM Andrew Dalke notifications@github.com wrote:

In mmpdblib/schema.py change the method MMPDatabase.execute from

    if 0:
        import time
        print("EXECUTE")
        print(sql)
        print(repr(args))

to

    if 1:
        import time
        print("EXECUTE")
        print(sql)
        print(repr(args))

That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters.

The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9#issuecomment-474248884, or mute the thread https://github.com/notifications/unsubscribe-auth/ApV2_iV1fKzS1mKmcVzAXHRkEUTZbuHkks5vYKObgaJpZM4bpQL2 .

adalke commented 5 years ago

I am surprised at that. As far as I can tell, the analysis routines all use something like self.mmpa_db.execute, which goes through the method I asked you to modify.

I don't have the time to figure out where those specific calls are being done.

Are you sure that you've modified the right code? That is, sometimes it's hard to figure out if Python is using a modified file vs. the installed package file.

All I can suggest is that you trace through the code to find out where those SQL calls are being done, and add the print statements in the correct places. This can be done with the debugger (including the graphical IDE "IDLE" which comes as part of the distribution), among other methods.

KramerChristian commented 5 years ago

Hi Valery,

is this issue still open? If yes, could you post an example that I can use to trace down the problem in the code?

Thank you, Christian

ValeryPolyakov commented 5 years ago

Christian and Andrew,

I was able to print the statements. Thanks.

By the way, is there any way for me to close the issue?

Valery R. Polyakov

On Sun, May 19, 2019 at 11:28 PM Christian Kramer notifications@github.com wrote:

Hi Valery,

is this issue still open? If yes, could you post an example that I can use to trace down the problem in the code?

Thank you, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9?email_source=notifications&email_token=AKKXN7VJQZYX2BQMVMY2TP3PWJAHXA5CNFSM4G5FAL3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVXZ7VQ#issuecomment-493854678, or mute the thread https://github.com/notifications/unsubscribe-auth/AKKXN7QRMJEOQMO6VFHJ2O3PWJAHXANCNFSM4G5FAL3A .

KramerChristian commented 5 years ago

Hi Valery,

I don't know whether you can close it. I will close this issue. If there are still questions open regarding this issue, we can open it again.

Best regards, Christian

ValeryPolyakov commented 5 years ago

Thanks

On Mon, May 20, 2019, 1:00 PM Christian Kramer notifications@github.com wrote:

Hi Valery,

I don't know whether you can close it. I will close this issue. If there are still questions open regarding this issue, we can open it again.

Best regards, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/mmpdb/issues/9?email_source=notifications&email_token=AKKXN7VZAF7BUDKVYCM66UTPWKAGTA5CNFSM4G5FAL3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVYOUMA#issuecomment-493939248, or mute the thread https://github.com/notifications/unsubscribe-auth/AKKXN7WYHM2WCCC5GWIJZOLPWKAGTANCNFSM4G5FAL3A .