rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
206 stars 55 forks source link

Can I get a table of all rules, as well as the number of pairs, and statistics for each rule? #12

Closed chengthefang closed 5 years ago

chengthefang commented 5 years ago

Hi,

I wonder if there is a way to obtain the information of all rules, as well as the number of pairs and the statistics for each rule for a built fragment(mmpdb) database. A simple output table I expect is like:

            from_smiles(smirks) .   To_smiles(smirks)    # of pairs      Mean    std

rule1 rule2 .....

I am pretty interested in presenting the rules from a database in a similar way to the Tables1-5 & Figure 5 in your publication "J. Med. Chem. 2018, 61, 3277−3292". Would you mind me give some hints how to achieve that through mmpdb codes?

Thanks, Cheng

adalke commented 5 years ago

There is not. These sorts of reports are not hard to do in SQL - though that requires someone who knows SQL, of course. You might start by taking a look at schema.py:

    def get_property_rule(self, property_name_id, rule_environment_id, is_reversed, cursor=None):
        c = self.mmpa_db.execute(
            "SELECT rule_environment.rule_id, from_smiles.smiles, from_smiles.num_heavies, to_smiles.smiles, to_smiles.num_heavies, "
            "          rule_environment.radius, rule_environment.environment_fingerprint_id, "
            "          environment_fingerprint.fingerprint, "
            "          rule_environment_statistics.id, count, avg, std, kurtosis, skewness, min, q1, median, q3, max, paired_t, p_value "
            "  FROM rule, rule_environment, environment_fingerprint, rule_environment_statistics, "
            "          rule_smiles as from_smiles, rule_smiles as to_smiles "
            " WHERE rule_environment.id = ? "
            "   AND rule_environment_statistics.rule_environment_id = ? "
            "   AND rule_environment_statistics.property_name_id = ? "
            "   AND rule_environment.rule_id = rule.id "
            "   AND rule_environment.environment_fingerprint_id = environment_fingerprint.id "
            "   AND rule.from_smiles_id = from_smiles.id "
            "   AND rule.to_smiles_id = to_smiles.id ",
            (rule_environment_id, rule_environment_id, property_name_id), cursor=cursor)

Bear in mind that each property and environment radius has its own set of rules.

chengthefang commented 5 years ago

Hi Adalke,

Thank you for your kind comments! I would to give a try.

Best, Cheng

KramerChristian commented 5 years ago

Hi Cheng,

did this solve your problem? If yes, I woudl like to close the issue.

Best regards, Christian

chengthefang commented 5 years ago

@KramerChristian Thanks, Christian. This solved my problem. I will close the issue from my end.

Yueming-Yin commented 4 months ago

I don't know how to use SQL to get the table of all rules, as well as the number of pairs, and statistics for each rule. Could you please share your steps to achieve this? @adalke @chengthefang

There is not. These sorts of reports are not hard to do in SQL - though that requires someone who knows SQL, of course. You might start by taking a look at schema.py:

    def get_property_rule(self, property_name_id, rule_environment_id, is_reversed, cursor=None):
        c = self.mmpa_db.execute(
            "SELECT rule_environment.rule_id, from_smiles.smiles, from_smiles.num_heavies, to_smiles.smiles, to_smiles.num_heavies, "
            "          rule_environment.radius, rule_environment.environment_fingerprint_id, "
            "          environment_fingerprint.fingerprint, "
            "          rule_environment_statistics.id, count, avg, std, kurtosis, skewness, min, q1, median, q3, max, paired_t, p_value "
            "  FROM rule, rule_environment, environment_fingerprint, rule_environment_statistics, "
            "          rule_smiles as from_smiles, rule_smiles as to_smiles "
            " WHERE rule_environment.id = ? "
            "   AND rule_environment_statistics.rule_environment_id = ? "
            "   AND rule_environment_statistics.property_name_id = ? "
            "   AND rule_environment.rule_id = rule.id "
            "   AND rule_environment.environment_fingerprint_id = environment_fingerprint.id "
            "   AND rule.from_smiles_id = from_smiles.id "
            "   AND rule.to_smiles_id = to_smiles.id ",
            (rule_environment_id, rule_environment_id, property_name_id), cursor=cursor)

Bear in mind that each property and environment radius has its own set of rules.