RuntimeWarning: overflow encountered in reduce

DinosaurInSpace commented 4 years ago

I am calculating the Mordred descriptors for a subset of 10k or so of HMDB. I get the following errors:

" RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwards) "

Interestingly, the number of errors increases over time as I run through the set. I am not sure if this is some issue with the system, or perhaps a bias as you go through the hmdb subset.

I am happy to send over the hmdb id's and structures as pickle if you are interested.

Thanks so much! Super happy with everything else so far...

-- I am running very simple code for this operation per your tutorial:

calc = Calculator(descriptors) df = calc.pandas('mol')

--

channels:

rdkit
bioconda
mordred-descriptor
conda-forge
anaconda
defaults dependencies:
altair=3.2.0=py36_0
appnope=0.1.0=py36hf537a9a_0
asn1crypto=1.2.0=py36_0
attrs=19.2.0=py_0
backcall=0.1.0=py36_0
blas=1.0=mkl
bleach=3.1.0=py36_0
bzip2=1.0.8=h1de35cc_0
ca-certificates=2019.10.16=0
cairo=1.14.12=hc4e6be7_4
certifi=2019.9.11=py36_0
cffi=1.13.1=py36hb5b8e2f_0
chardet=3.0.4=py36_1003
cryptography=2.8=py36ha12b0ac_0
cycler=0.10.0=py_1
dbus=1.13.6=h90a0687_0
decorator=4.4.0=py36_1
defusedxml=0.6.0=py_0
entrypoints=0.3=py36_0
expat=2.2.6=h0a44026_0
fontconfig=2.13.0=h5d5b041_1
freetype=2.10.0=h24853df_1
gettext=0.19.8.1=h15daf44_3
glib=2.56.2=hd9629dc_0
icu=58.2=h4b95b61_1
idna=2.8=py36_0
intel-openmp=2019.5=281
ipykernel=5.1.2=py36h39e3cac_0
ipython=7.8.0=py36h39e3cac_0
ipython_genutils=0.2.0=py36h241746c_0
ipywidgets=7.5.1=py_0
jedi=0.15.1=py36_0
jinja2=2.10.3=py_0
joblib=0.13.2=py36_0
jpeg=9b=he5867d9_2
jsonschema=3.0.2=py36_0
jupyter=1.0.0=py36_7
jupyter_client=5.3.3=py36_1
jupyter_console=6.0.0=py36_0
jupyter_core=4.5.0=py_0
kiwisolver=1.1.0=py36h770b8ee_0
libboost=1.67.0=hebc422b_4
libcxx=4.0.1=hcfea43d_1
libcxxabi=4.0.1=hcfea43d_1
libedit=3.1.20181209=hb402a30_0
libffi=3.2.1=h475c297_4
libgfortran=3.0.1=h93005f0_2
libiconv=1.15=hdd342a3_7
libpng=1.6.37=h2573ce8_0
libsodium=1.0.16=h3efe00b_0
libtiff=4.0.10=hcb84e12_2
libxml2=2.9.9=hf6e021a_1
libxslt=1.1.33=h33a18ac_0
llvm-openmp=4.0.1=hcfea43d_1
llvmlite=0.30.0=py36h98b8051_0
lxml=4.4.1=py36hef8c89e_0
markupsafe=1.1.1=py36h1de35cc_0
matplotlib=3.1.1=py36_1
matplotlib-base=3.1.1=py36h3a684a6_1
matplotlib-venn=0.11.5=py_1
mistune=0.8.4=py36h1de35cc_0
mkl=2019.5=281
mkl-service=2.3.0=py36hfbe908c_0
mkl_fft=1.0.14=py36h5e564d8_0
mkl_random=1.1.0=py36ha771720_0
mordred=1.2.0=pyhe5148d4_0
nbconvert=5.6.0=py36_1
nbformat=4.4.0=py36h827af21_0
ncurses=6.1=h0a44026_1
networkx=2.4=py_0
notebook=6.0.1=py36_0
numba=0.46.0=py36h6440ff4_0
numpy=1.17.2=py36h99e6662_0
numpy-base=1.17.2=py36h6575580_0
olefile=0.46=py36_0
openssl=1.1.1=h1de35cc_0
pandas=0.25.1=py36h0a44026_0
pandoc=2.2.3.2=0
pandocfilters=1.4.2=py36_1
parso=0.5.1=py_0
pcre=8.43=h0a44026_0
pexpect=4.7.0=py36_0
pickleshare=0.7.5=py36_0
pillow=6.2.0=py36hb68e598_0
pip=19.2.3=py36_0
pixman=0.38.0=h1de35cc_0
prometheus_client=0.7.1=py_0
prompt_toolkit=2.0.10=py_0
ptyprocess=0.6.0=py36_0
py-boost=1.67.0=py36h6440ff4_4
pycparser=2.19=py36_0
pygments=2.4.2=py_0
pyimzml=1.2.6=py_1
pyopenssl=19.0.0=py36_0
pyparsing=2.4.2=py_0
pyqt=5.9.2=py36h655552a_0
pyrsistent=0.15.4=py36h1de35cc_0
pysocks=1.7.1=py36_0
pyteomics=4.1.2=py_0
python=3.6.8=haf84260_0
python-dateutil=2.8.0=py36_0
pytz=2019.3=py_0
pyzmq=18.1.0=py36h0a44026_0
qt=5.9.7=h468cd18_1
qtconsole=4.5.5=py_0
rdkit=2019.03.4.0=py36h65625ec_1
readline=7.0=h1de35cc_5
requests=2.22.0=py36_0
scikit-learn=0.21.3=py36h27c97d8_0
scipy=1.3.1=py36h1410ff5_0
send2trash=1.5.0=py36_0
setuptools=41.4.0=py36_0
sip=4.19.13=py36h0a44026_0
six=1.12.0=py36_0
spectrum_utils=0.3.2=py_2
sqlalchemy=1.3.10=py36h1de35cc_0
sqlite=3.30.0=ha441bb4_0
tbb=2019.8=h04f5b5a_0
terminado=0.8.2=py36_0
testpath=0.4.2=py36_0
tk=8.6.8=ha441bb4_0
toolz=0.10.0=py_0
tornado=6.0.3=py36h01d97ff_0
tqdm=4.36.1=py_0
traitlets=4.3.3=py36_0
wcwidth=0.1.7=py36h8c6ec74_0
webencodings=0.5.1=py36_1
wheel=0.33.6=py36_0
wheezy.template=0.1.167=py_1
widgetsnbextension=3.5.1=py36_0
xz=5.2.4=h1de35cc_4
zeromq=4.3.1=h0a44026_3
zlib=1.2.11=h1de35cc_3
zstd=1.3.7=h5bba6e5_0
pip:
- boto3==1.10.10
- botocore==1.13.10
- docutils==0.15.2
- elasticsearch==5.4.0
- elasticsearch-dsl==5.3.0
- jmespath==0.9.4
- metaspace2020==1.4.3
- plotly==4.2.1
- pymspec==0.1.2
- pyyaml==5.1.2
- retrying==1.3.3
- s3transfer==0.2.1
- urllib3==1.25.6

--

OS/distribution

Mac OSX 10.15 Catalina

conda or pip

conda

python version

Python 3.6.8 :: Anaconda, Inc.

library version

rdkit 2019.03.4.0 py36h65625ec_1 rdkit

remseven commented 3 years ago

I am facing the same issue, when running Mordred in command line, sometimes even with a single molecule. It seems that it occurs mostly with large molecules.

plkx commented 3 years ago

I've confirmed this problem, also.

I am trying to isolate the source by backtracking through compound sets and various python updates. I have previously run much larger molecules without the error.

It may have to wait a few days before this gets prioritized for me.

Regards,

PLKX

plkx commented 3 years ago

Question either poster who have experienced this problem:

What, if any modifications to mordred are there in your system?

For example, does your environment have code modifications such as these (or comparable): https://github.com/mordred-descriptor/mordred/issues/80#issuecomment-718042620 https://github.com/mordred-descriptor/mordred/issues/80#issuecomment-725330942 ??

I see that DinosaurInSpace is using networkx=2.4. I am using 2.5 with requisite code modifications in DetourMatrix.py which allow successful completion of mordred self-tests.

The mordred self-tests do not test any molecules with a number of atoms equal or greater to those for which I have encountered the numpy overflow warning. Since the detour matrix is an n×n matrix for n = the number of heavy atoms (non-hydrogen), this seems the most logical starting point to track this problem.

However, if the problem occurs in the absence of the DetourMatrix.py modifications, I may look elsewhere.

Thanks,

PLKX

batmanscode commented 3 years ago

I'm having this overflow reduce problem as well.

Env: google colab

Code:

! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
! conda install -c rdkit rdkit -y
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
# used rdkit to calculate lipinski descriptors in the same notebook before installing and using mordred

pip install -q 'mordred[full]'

from rdkit import Chem
from mordred import Calculator, descriptors

mols = [Chem.MolFromSmiles(smi) for smi in data['canonical_smiles']]
df = calc.pandas(mols)

data: canon_smiles.txt

remseven commented 3 years ago

@plkx, Sorry for not answering earlier... In my case I have modified environment to use networkx=2.1.0, based on what I had read in previous issues. This was about a year ago when I installed Mordred for the first time. At the time it seemed to fix the issues I faced (probably in self test).

Last week, I got this overflow problem: ~/anaconda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:87: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Previously I also had this one: ~/anaconda/lib/python3.6/site-packages/mordred/_matrix_attributes.py:251: RuntimeWarning: invalid value encountered in double_scalars s += (eig.vec[i, eig.max] * eig.vec[j, eig.max]) * -0.5 ~/anaconda/lib/python3.6/site-packages/mordred/_matrix_attributes.py:251: RuntimeWarning: divide by zero encountered in double_scalars s += (eig.vec[i, eig.max] eig.vec[j, eig.max]) ** -0.5

I hope this can help. Let me know if you want me to run a few test. Sadly I can't communicate the structures I am studying. I have run the list from @batmanscode and I do get some overflow too (see chosen pieces attached).

canon_smiles_calculated.txt

mordred-descriptor / mordred