mordred-descriptor / mordred

a molecular descriptor calculator
http://mordred-descriptor.github.io/documentation/master/
BSD 3-Clause "New" or "Revised" License
340 stars 91 forks source link

RuntimeWarning: overflow encountered in reduce #81

Open DinosaurInSpace opened 4 years ago

DinosaurInSpace commented 4 years ago

I am calculating the Mordred descriptors for a subset of 10k or so of HMDB. I get the following errors:

" RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwards) "

Interestingly, the number of errors increases over time as I run through the set. I am not sure if this is some issue with the system, or perhaps a bias as you go through the hmdb subset.

I am happy to send over the hmdb id's and structures as pickle if you are interested.

Thanks so much! Super happy with everything else so far...

-- I am running very simple code for this operation per your tutorial:

calc = Calculator(descriptors) df = calc.pandas('mol')

--

channels:

--

OS/distribution

Mac OSX 10.15 Catalina

conda or pip

conda

python version

Python 3.6.8 :: Anaconda, Inc.

library version

rdkit 2019.03.4.0 py36h65625ec_1 rdkit

remseven commented 3 years ago

I am facing the same issue, when running Mordred in command line, sometimes even with a single molecule. It seems that it occurs mostly with large molecules.

plkx commented 3 years ago

I've confirmed this problem, also.

I am trying to isolate the source by backtracking through compound sets and various python updates. I have previously run much larger molecules without the error.

It may have to wait a few days before this gets prioritized for me.

Regards,

PLKX

plkx commented 3 years ago

Question either poster who have experienced this problem:

What, if any modifications to mordred are there in your system?

For example, does your environment have code modifications such as these (or comparable): https://github.com/mordred-descriptor/mordred/issues/80#issuecomment-718042620 https://github.com/mordred-descriptor/mordred/issues/80#issuecomment-725330942 ??

I see that DinosaurInSpace is using networkx=2.4. I am using 2.5 with requisite code modifications in DetourMatrix.py which allow successful completion of mordred self-tests.

The mordred self-tests do not test any molecules with a number of atoms equal or greater to those for which I have encountered the numpy overflow warning. Since the detour matrix is an n×n matrix for n = the number of heavy atoms (non-hydrogen), this seems the most logical starting point to track this problem.

However, if the problem occurs in the absence of the DetourMatrix.py modifications, I may look elsewhere.

Thanks,

PLKX

batmanscode commented 3 years ago

I'm having this overflow reduce problem as well.

image

Env: google colab

Code:

! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
! conda install -c rdkit rdkit -y
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
# used rdkit to calculate lipinski descriptors in the same notebook before installing and using mordred

pip install -q 'mordred[full]'

from rdkit import Chem
from mordred import Calculator, descriptors

mols = [Chem.MolFromSmiles(smi) for smi in data['canonical_smiles']]
df = calc.pandas(mols)

data: canon_smiles.txt

remseven commented 3 years ago

@plkx, Sorry for not answering earlier... In my case I have modified environment to use networkx=2.1.0, based on what I had read in previous issues. This was about a year ago when I installed Mordred for the first time. At the time it seemed to fix the issues I faced (probably in self test).

Last week, I got this overflow problem: ~/anaconda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:87: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Previously I also had this one: ~/anaconda/lib/python3.6/site-packages/mordred/_matrix_attributes.py:251: RuntimeWarning: invalid value encountered in double_scalars s += (eig.vec[i, eig.max] * eig.vec[j, eig.max]) * -0.5 ~/anaconda/lib/python3.6/site-packages/mordred/_matrix_attributes.py:251: RuntimeWarning: divide by zero encountered in double_scalars s += (eig.vec[i, eig.max] eig.vec[j, eig.max]) ** -0.5

I hope this can help. Let me know if you want me to run a few test. Sadly I can't communicate the structures I am studying. I have run the list from @batmanscode and I do get some overflow too (see chosen pieces attached).

canon_smiles_calculated.txt