Open zhouhao-learning opened 5 years ago
What kinds of extra characters do you have? You probably need to standardize your SMILEs (remove metals, mixtures, stereochemistry, etc.).
@isayev
My SMILES contains extra characters a
, because the characters contain Na
, Ca
, what do you mean by standardized SMILES? What do I need to do? Thank you
hi @zhouhao-learning , Did you solve your problem? I am facing the same issue. I you have the solution please enlighten me.
hi @zhouhao-learning , Did you solve your problem? I am facing the same issue. I you have the solution please enlighten me.
Although the question is old, I'm answering it now because it seems it still unresolved...
Basically, the point is that you generally don't want ions (Na+, Ca2+) in your compound library, since they are just counterions to your compound. So, you need to remove those from the your SMILES data before using it.
Take a look at: https://molvs.readthedocs.io/en/latest/
Best.
Hello, when I train a generate model with my own SMILES data, use
LogP_optimization_demo.ipynb
:tokens = ['<', '>', '#', '%', ')', '(', '+', '-', '/', '.', '1', '0 ', '3', '2', '5', '4', '7', '6', '9', '8', '=', 'A', '@', 'C', 'B', 'F', 'I', 'H', 'O', 'N', 'P', 'S', '[', ']', '\\', 'c', ' e', 'i', 'l', 'o', 'n', 'p', 's', 'r', '\n']
, but will get characters outside the tokens list, causing me to fail Continue to use the Transfer learning method to train, so I changed the code as follows during training:But I get the following error:
But my data set is very small. Without migration learning, my generation model may not be able to learn the chemical rules of SMILES, so my idea is this: I use the `data/chembl_22_clean_1576904_sorted_std_final.smi'data set to retrain a model, but I customize tokens to define the characters in my data set into token, and finally make it work again. Re-training my data with a pre-training model, is my idea right? I'm not sure.