ur-whitelab / exmol

Explainer for black box models that predict molecule properties
https://ur-whitelab.github.io/exmol/
MIT License
287 stars 42 forks source link

Error after installation #64

Closed PARODBE closed 2 years ago

PARODBE commented 2 years ago

Hi,

First at all, thank you for your work!. I am obtaining a problem installing your library, o better say when I do "import exmol", I obtaing one error:"No module named 'dataclasses'".

I have installed as: pip install exmol...

Thanks!

PARODBE commented 2 years ago

Oh sorry is another package, I thought that It was an package of exmol sorry

whitead commented 2 years ago

Hi @PARODBE thanks for pointing this out. This is actually an issue - maybe it should be a dependency. Can you tell me what version of Python you have? I believe dataclasses should be part of the standard library.

PARODBE commented 2 years ago

Hi,

Finally, I can work with te library. I have some questions. The first would be, when you show the counterfactual, you obtain your selected molecule and 3 more molecules from the same set of molecules that your selected ? In other words, in my case the selected molecule is part of test set, therefore the other 3 molecules with which the similarity is extracted are part of the test set or training set.

Another question is, are there any possibility in order to show the name of the molecule in the image?

And the last question, how would it be possible to pass a list of molecules?

My code is:

model=RandomForestClassifier(random_state=46) model.fit(X_train_fp,tr.Activity)

def model_eval(smilist, =None): mols = [Chem.MolFromSmiles(smi) for smi in smi_list] feats = np.array([AllChem.GetMorganFingerprintAsBitVect(mol,radius=2,nBits=2048,useFeatures=True) for mol in mols]) labels = model.predict(feats).astype('int') return labels

i=9 space = exmol.sample_space(Chem.MolToSmiles(mol_test[i]), model_eval) mol_test[i]

exps = exmol.cf_explain(space)

fkw = {'figsize': (8,6)} plt.rc('axes', titlesize=12) exmol.plot_cf(exps, figure_kwargs=fkw, mol_size=(450,400), nrows=1) plt.savefig('rf-simple.png', dpi=180) svg = exmol.insert_svg(exps, mol_fontsize=14)

import skunk

font = {'family' : 'normal', 'weight' : 'normal', 'size' : 22} exmol.plot_space(space, exps, figure_kwargs=fkw, mol_size=(300,200), offset=0, cartoon=True, rasterized=True) plt.scatter([], [], label='Counterfactual', s=150, color=plt.get_cmap('viridis')(1.0)) plt.scatter([], [], label='Same Class', s=150, color=plt.get_cmap('viridis')(0.0)) plt.legend(fontsize=22) plt.tight_layout() svg = exmol.insert_svg(exps, mol_fontsize=14)

geemi725 commented 2 years ago

This is what I think..

Question 1: In other words, in my case the selected molecule is part of test set, therefore the other 3 molecules with which the similarity is extracted are part of the test set or training set? Not necessarily. First, we use the STONED algorithm to generate a “new set” of molecules. Then we use Tanimoto similarity to find similar molecules. Therefore, similar molecules can then be a part of your initial dataset or it can be a completely new molecule.

Question 2: are there any possibility in order to show the name of the molecule in the image? We have not added the function to print the name/SMILES string of the molecule (or any other label) along with similarity when generating the images. But yes it is possible to add the SMILES string or a different label by editing the code locally.

Take a look at this code: https://github.com/ur-whitelab/exmol/blob/fe373d921d84fe0436d8b6554c113f8fe7cf8b03/exmol/exmol.py#L409

Question 3: And the last question, how would it be possible to pass a list of molecules?

Not sure what you mean by the list of molecules. You mean a list of base molecules?

PARODBE commented 2 years ago

Thanks for your answers! Regarding to third question/answer, yes, I want to say a lis of base molecules. By the way, the stoned algorithm is defined in your paper, no? For understand better, I don't know if it is a GAN for example or a VAE or another generative model.

geemi725 commented 2 years ago

Right now you cannot pass a list of base molecules unless you enumerate through it. Also, this is the reference to the STONED algorithm