yoshida-lab / XenonPy

XenonPy is a Python Software for Materials Informatics
http://xenonpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
131 stars 57 forks source link

NGram and iQSPR errors in XenonPy 0.5.1 #231

Open bcanault opened 3 years ago

bcanault commented 3 years ago

Hi folks,

First of all, thank you very much for all your work. It's really interesting. I have tried to use XenonPy and try to rebuild your tutorial. Unfortunately, I observed 2 errors by using the following codes:

Package version: 0.5.1

NGram issue with unkown ngram_tab

I think it was replace by ngram_table, but I'm not sure.


# N-gram library in XenonPy-iQSPR
from xenonpy.inverse.iqspr import NGram

# initialize a new n-gram
n_gram = NGram()
#n_gram.ngram_tab = n_gram.ngram_table

# train the n-gram with SMILES of available molecules
n_gram.fit(data_ss['SMILES'],train_order=5)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/anaconda/envs/xenonpy/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj, include, exclude)
    968 
    969             if method is not None:
--> 970                 return method(include=include, exclude=exclude)
    971             return None
    972         else:

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/base.py in _repr_mimebundle_(self, **kwargs)
    462     def _repr_mimebundle_(self, **kwargs):
    463         """Mime bundle used by jupyter kernels to display estimator"""
--> 464         output = {"text/plain": repr(self)}
    465         if get_config()["display"] == 'diagram':
    466             output["text/html"] = estimator_html_repr(self)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/base.py in __repr__(self, N_CHAR_MAX)
    258             n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW)
    259 
--> 260         repr_ = pp.pformat(self)
    261 
    262         # Use bruteforce ellipsis when there are a lot of non-blank characters

~/anaconda/envs/xenonpy/lib/python3.7/pprint.py in pformat(self, object)
    142     def pformat(self, object):
    143         sio = _StringIO()
--> 144         self._format(object, sio, 0, 0, {}, 0)
    145         return sio.getvalue()
    146 

~/anaconda/envs/xenonpy/lib/python3.7/pprint.py in _format(self, object, stream, indent, allowance, context, level)
    159             self._readable = False
    160             return
--> 161         rep = self._repr(object, context, level)
    162         max_width = self._width - indent - allowance
    163         if len(rep) > max_width:

~/anaconda/envs/xenonpy/lib/python3.7/pprint.py in _repr(self, object, context, level)
    391     def _repr(self, object, context, level):
    392         repr, readable, recursive = self.format(object, context.copy(),
--> 393                                                 self._depth, level)
    394         if not readable:
    395             self._readable = False

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/utils/_pprint.py in format(self, object, context, maxlevels, level)
    179     def format(self, object, context, maxlevels, level):
    180         return _safe_repr(object, context, maxlevels, level,
--> 181                           changed_only=self._changed_only)
    182 
    183     def _pprint_estimator(self, object, stream, indent, allowance, context,

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _safe_repr(object, context, maxlevels, level, changed_only)
    423         recursive = False
    424         if changed_only:
--> 425             params = _changed_params(object)
    426         else:
    427             params = object.get_params(deep=False)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _changed_params(estimator)
     89     estimator with non-default values."""
     90 
---> 91     params = estimator.get_params(deep=False)
     92     init_func = getattr(estimator.__init__, 'deprecated_original',
     93                         estimator.__init__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/base.py in get_params(self, deep)
    193         out = dict()
    194         for key in self._get_param_names():
--> 195             value = getattr(self, key)
    196             if deep and hasattr(value, 'get_params'):
    197                 deep_items = value.get_params().items()

AttributeError: 'NGram' object has no attribute 'ngram_tab'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/anaconda/envs/xenonpy/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    698     """A pprint that just redirects to the normal repr function."""
    699     # Find newlines and replace them with p.break_()
--> 700     output = repr(obj)
    701     lines = output.splitlines()
    702     with p.group():

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/base.py in __repr__(self, N_CHAR_MAX)
    258             n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW)
    259 
--> 260         repr_ = pp.pformat(self)
    261 
    262         # Use bruteforce ellipsis when there are a lot of non-blank characters

~/anaconda/envs/xenonpy/lib/python3.7/pprint.py in pformat(self, object)
    142     def pformat(self, object):
    143         sio = _StringIO()
--> 144         self._format(object, sio, 0, 0, {}, 0)
    145         return sio.getvalue()
    146 

~/anaconda/envs/xenonpy/lib/python3.7/pprint.py in _format(self, object, stream, indent, allowance, context, level)
    159             self._readable = False
    160             return
--> 161         rep = self._repr(object, context, level)
    162         max_width = self._width - indent - allowance
    163         if len(rep) > max_width:

~/anaconda/envs/xenonpy/lib/python3.7/pprint.py in _repr(self, object, context, level)
    391     def _repr(self, object, context, level):
    392         repr, readable, recursive = self.format(object, context.copy(),
--> 393                                                 self._depth, level)
    394         if not readable:
    395             self._readable = False

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/utils/_pprint.py in format(self, object, context, maxlevels, level)
    179     def format(self, object, context, maxlevels, level):
    180         return _safe_repr(object, context, maxlevels, level,
--> 181                           changed_only=self._changed_only)
    182 
    183     def _pprint_estimator(self, object, stream, indent, allowance, context,

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _safe_repr(object, context, maxlevels, level, changed_only)
    423         recursive = False
    424         if changed_only:
--> 425             params = _changed_params(object)
    426         else:
    427             params = object.get_params(deep=False)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _changed_params(estimator)
     89     estimator with non-default values."""
     90 
---> 91     params = estimator.get_params(deep=False)
     92     init_func = getattr(estimator.__init__, 'deprecated_original',
     93                         estimator.__init__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/sklearn/base.py in get_params(self, deep)
    193         out = dict()
    194         for key in self._get_param_names():
--> 195             value = getattr(self, key)
    196             if deep and hasattr(value, 'get_params'):
    197                 deep_items = value.get_params().items()

AttributeError: 'NGram' object has no attribute 'ngram_tab'

Unkown error in iQSPR


import pickle as pk

# library for running iQSPR in XenonPy-iQSPR
from xenonpy.inverse.iqspr import IQSPR

# update NGram parameters for this exampleHOMO-LUMO gap
n_gram.set_params(del_range=[1,20],max_len=500, reorder_prob=0.5, sample_order=(1,20))
n_gram.ngram_tab = n_gram.ngram_table

# set up likelihood and n-gram models in iQSPR
iqspr_reorder = IQSPR(estimator=prd_mdls, modifier=n_gram)

np.random.seed(201906) # fix the random seed
# main loop of iQSPR
iqspr_samples1, iqspr_loglike1, iqspr_prob1, iqspr_freq1 = [], [], [], []
for s, ll, p, freq in iqspr_reorder(init_samples, beta, yield_lpf=True):
    iqspr_samples1.append(s)
    iqspr_loglike1.append(ll)
    iqspr_prob1.append(p)
    iqspr_freq1.append(freq)
# record all outputs
iqspr_results_reorder = {
    "samples": iqspr_samples1,
    "loglike": iqspr_loglike1,
    "prob": iqspr_prob1,
    "freq": iqspr_freq1,
    "beta": beta
}
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-82-baffb74333e7> in <module>
----> 1 list(iqspr_reorder(init_samples, beta, yield_lpf=True))

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/base.py in __call__(self, samples, beta, size, yield_lpf)
    520                 self.on_errors(i + 1, samples, e)
    521             except Exception as e:
--> 522                 raise e

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/base.py in __call__(self, samples, beta, size, yield_lpf)
    503             try:
    504                 re_samples = self.resample(unique, frequency, size, p)
--> 505                 samples = self.proposal(re_samples)
    506 
    507                 unique, frequency = self.unique(samples)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/utils/useful_cls.py in fn_(self, *args, **kwargs)
    100                 self._timer.start(fn.__name__)
    101                 try:
--> 102                     rt = fn(self, *args, **kwargs)
    103                 finally:
    104                     self._timer.stop(fn.__name__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/base.py in proposal(self, X)
    395             raise NotImplementedError('user need to implement <proposal> method or'
    396                                       'set <self._proposal> to a instance of <BaseProposal>')
--> 397         return self._proposal(X)
    398 
    399     def on_errors(self, ite, samples, error):

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/base.py in __call__(self, X)
    283 
    284     def __call__(self, X):
--> 285         return self.proposal(X)
    286 
    287     def on_errors(self, error):

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/utils/useful_cls.py in fn_(self, *args, **kwargs)
    100                 self._timer.start(fn.__name__)
    101                 try:
--> 102                     rt = fn(self, *args, **kwargs)
    103                 finally:
    104                     self._timer.stop(fn.__name__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/iqspr/modifier.py in proposal(self, smiles)
    578 
    579             except Exception as e:
--> 580                 raise e
    581 
    582         return new_smis

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/iqspr/modifier.py in proposal(self, smiles)
    565             ext_smi = self.smi2esmi(smi)
    566             try:
--> 567                 new_ext_smi = self.modify(ext_smi)
    568                 new_smi = self.esmi2smi(new_ext_smi)
    569                 if Chem.MolFromSmiles(new_smi) is None:

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/utils/useful_cls.py in fn_(self, *args, **kwargs)
    100                 self._timer.start(fn.__name__)
    101                 try:
--> 102                     rt = fn(self, *args, **kwargs)
    103                 finally:
    104                     self._timer.stop(fn.__name__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/iqspr/modifier.py in modify(self, ext_smi)
    206         # add until reaching '!' or a given max value
    207         for i in range(self.max_len - len(ext_smi)):
--> 208             ext_smi, _ = self.sample_next_char(ext_smi)
    209             if ext_smi['esmi'].iloc[-1] == '!':
    210                 return ext_smi  # stop when hitting '!', assume must be valid SMILES

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/utils/useful_cls.py in fn_(self, *args, **kwargs)
    100                 self._timer.start(fn.__name__)
    101                 try:
--> 102                     rt = fn(self, *args, **kwargs)
    103                 finally:
    104                     self._timer.stop(fn.__name__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/iqspr/modifier.py in sample_next_char(self, ext_smi)
    447         iB = ext_smi['n_br'].iloc[-1] > 0
    448         iR = ext_smi['n_ring'].iloc[-1]
--> 449         cand_char, cand_prob = self.get_prob(ext_smi['substr'].iloc[-1], iB, iR)
    450         # here we assume cand_char is not empty
    451         idx = np.random.choice(range(len(cand_char)), p=cand_prob)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/utils/useful_cls.py in fn_(self, *args, **kwargs)
    100                 self._timer.start(fn.__name__)
    101                 try:
--> 102                     rt = fn(self, *args, **kwargs)
    103                 finally:
    104                     self._timer.stop(fn.__name__)

~/anaconda/envs/xenonpy/lib/python3.7/site-packages/xenonpy/inverse/iqspr/modifier.py in get_prob(self, tmp_str, iB, iR)
    434         for iO in range(self.sample_order[1] - 1, self.sample_order[0] - 2, -1):
    435             # if (len(tmp_str) > iO) & (str(tmp_str[-(iO + 1):]) in self._table[iO][iB][iR].index.tolist()):
--> 436             if len(tmp_str) > iO and str(tmp_str[-(iO + 1):]) in self._table[iO][iB][iR].index.tolist():
    437                 cand_char = self._table[iO][iB][iR].columns.tolist()
    438                 cand_prob = np.array(self._table[iO][iB][iR].loc[str(tmp_str[-(iO + 1):])])

IndexError: list index out of range```
stewu5 commented 3 years ago

@bcanault Thank you for your information. We have observed a few conflicts after v.0.5.1 due to some updates of rdkit and scikit-learn. The first problem of ngramtab is actually due to scikit-learn's new update. We have fixed the issue and this should be solved in the next update. Right now, you can change it to " = n_gram.fit(data_ss['SMILES'],train_order=5)" to temporarily solve the problem. The second problem is not clear to me yet. My guess is due to some updates in rdkit but we have not find the problem yet. We will try to solve it as soon as possible.

stewu5 commented 3 years ago

@bcanault For the second problem, I have tested the tutorial myself but I did not get the error you shown. Did you use the NGram model by combing "ngram_pubchem_ikebata_reO15_O10.obj" and "ngram_pubchem_ikebata_reO15_O11to20.obj"? The error you shown seems to be related to the NGram failed to find information relevant to the molecule being modified, which often is due to the initial molecule sample having too many ring structures (under the SMILES representation) than the NGram has ever seen during its training. I recommend you to try to reproduce the problem and output the final SMILES that caused the problem. Maybe we can help you from there.

bcanault commented 3 years ago

@stewu5 Thank you very much for your reply. I am looking forward to testing your new version as soon as it becomes available :). I will give you an update when I will test it. Thank you very much for your help.

deepakorani commented 2 years ago

@bcanault where you able to solve the second issue? I am running the same issue when I try to reproduce the results {unknown errors in iqspr}. I tried downgrading the version of Xenonpy, and I run into the same issue. I guess it is due to the new Rdkit package? @stewu5 Dr.Wu I am not sure what do you mean by printing the output; is it the initial samples that are being used for the iqspr runs?

stewu5 commented 2 years ago

@deepakorani @bcanault The NGram class provided in our current XenonPy version has been tested by different users in the passed few years and what we learned from our experience is that NGram can lead to weird results if the training data does not match the targeted SMILES. For example, if you train the NGram with molecules with only 1 nested ring and then use it to generate new molecules by starting with an initial molecule with 2 nested rings, you will run into trouble. What we have done before when our collaborators run into similar trouble was to fix the random seed and try to reproduce the exact same problem. We usually catch the final SMILES and initial SMILES that caused the problem for debug. With those information, we were able to pin point the issue 90% of the time. Therefore, I recommend trying to reproduce the error and at the same time try to store the final SMILES before the error occur (e.g., output the SMILES for every single step of SMILES modification using NGram). Hopefully, we can help you to resolve the problem from there.