wzxxxx / Prompt-MolOpt

a multi-property optimization method.
MIT License
22 stars 0 forks source link

A bug in sme_opt_utils.build_data #4

Open MlSAKA-MlKOTO opened 12 hours ago

MlSAKA-MlKOTO commented 12 hours ago

Hi, thank you for your solid work.

I am reconstructing the code and find that function return_brics_res_structure does not works as my expectation. For example, for the smiles 'NC(C(F)(F)F)(C(F)(F)F)P(=O)(Cc1ccccc1C)Cc1ccccc1', it failed to find the true BRICS submol for corresponding atom ids.

I suggest using following code for finding brics res smiles:

    brics_res_smi = []
    for substructure in all_brics_substructure_subset['substructure'].values():
        bond_ids = set()
        rw_mol=Chem.RWMol(m)
        for atom_idx in substructure:
            atom = m.GetAtomWithIdx(atom_idx)
            bonds = atom.GetBonds()
            for bond in bonds:
                bond_ids.add(bond.GetIdx())
                begin_idx=bond.GetBeginAtomIdx()
                end_idx=bond.GetEndAtomIdx()
                if begin_idx not in substructure:
                    atom = rw_mol.GetAtomWithIdx(begin_idx)
                    atom.SetAtomicNum(0)
                if end_idx not in substructure:
                    atom = rw_mol.GetAtomWithIdx(end_idx)
                    atom.SetAtomicNum(0)
        sub_mol=Chem.PathToSubmol(rw_mol,list(bond_ids))
        sub_smi=Chem.MolToSmiles(sub_mol)
        brics_res_smi.append(sub_smi)
MlSAKA-MlKOTO commented 10 hours ago

Also, may use this for murcko method to avoid 'Can't kekulize mol' error?