Open bcanault opened 3 years ago
@bcanault Thank you for your information. We have observed a few conflicts after v.0.5.1 due to some updates of rdkit and scikit-learn. The first problem of ngramtab is actually due to scikit-learn's new update. We have fixed the issue and this should be solved in the next update. Right now, you can change it to " = n_gram.fit(data_ss['SMILES'],train_order=5)" to temporarily solve the problem. The second problem is not clear to me yet. My guess is due to some updates in rdkit but we have not find the problem yet. We will try to solve it as soon as possible.
@bcanault For the second problem, I have tested the tutorial myself but I did not get the error you shown. Did you use the NGram model by combing "ngram_pubchem_ikebata_reO15_O10.obj" and "ngram_pubchem_ikebata_reO15_O11to20.obj"? The error you shown seems to be related to the NGram failed to find information relevant to the molecule being modified, which often is due to the initial molecule sample having too many ring structures (under the SMILES representation) than the NGram has ever seen during its training. I recommend you to try to reproduce the problem and output the final SMILES that caused the problem. Maybe we can help you from there.
@stewu5 Thank you very much for your reply. I am looking forward to testing your new version as soon as it becomes available :). I will give you an update when I will test it. Thank you very much for your help.
@bcanault where you able to solve the second issue? I am running the same issue when I try to reproduce the results {unknown errors in iqspr}. I tried downgrading the version of Xenonpy, and I run into the same issue. I guess it is due to the new Rdkit package? @stewu5 Dr.Wu I am not sure what do you mean by printing the output; is it the initial samples that are being used for the iqspr runs?
@deepakorani @bcanault The NGram class provided in our current XenonPy version has been tested by different users in the passed few years and what we learned from our experience is that NGram can lead to weird results if the training data does not match the targeted SMILES. For example, if you train the NGram with molecules with only 1 nested ring and then use it to generate new molecules by starting with an initial molecule with 2 nested rings, you will run into trouble. What we have done before when our collaborators run into similar trouble was to fix the random seed and try to reproduce the exact same problem. We usually catch the final SMILES and initial SMILES that caused the problem for debug. With those information, we were able to pin point the issue 90% of the time. Therefore, I recommend trying to reproduce the error and at the same time try to store the final SMILES before the error occur (e.g., output the SMILES for every single step of SMILES modification using NGram). Hopefully, we can help you to resolve the problem from there.
Hi folks,
First of all, thank you very much for all your work. It's really interesting. I have tried to use XenonPy and try to rebuild your tutorial. Unfortunately, I observed 2 errors by using the following codes:
Package version: 0.5.1
NGram issue with unkown ngram_tab
I think it was replace by
ngram_table
, but I'm not sure.Unkown error in iQSPR