Open DinosaurInSpace opened 4 years ago
I am facing the same issue, when running Mordred in command line, sometimes even with a single molecule. It seems that it occurs mostly with large molecules.
I've confirmed this problem, also.
I am trying to isolate the source by backtracking through compound sets and various python updates. I have previously run much larger molecules without the error.
It may have to wait a few days before this gets prioritized for me.
Regards,
PLKX
Question either poster who have experienced this problem:
What, if any modifications to mordred are there in your system?
For example, does your environment have code modifications such as these (or comparable): https://github.com/mordred-descriptor/mordred/issues/80#issuecomment-718042620 https://github.com/mordred-descriptor/mordred/issues/80#issuecomment-725330942 ??
I see that DinosaurInSpace is using networkx=2.4. I am using 2.5 with requisite code modifications in DetourMatrix.py which allow successful completion of mordred self-tests.
The mordred self-tests do not test any molecules with a number of atoms equal or greater to those for which I have encountered the numpy overflow warning. Since the detour matrix is an n×n matrix for n = the number of heavy atoms (non-hydrogen), this seems the most logical starting point to track this problem.
However, if the problem occurs in the absence of the DetourMatrix.py modifications, I may look elsewhere.
Thanks,
PLKX
I'm having this overflow reduce problem as well.
Env: google colab
Code:
! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
! conda install -c rdkit rdkit -y
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
# used rdkit to calculate lipinski descriptors in the same notebook before installing and using mordred
pip install -q 'mordred[full]'
from rdkit import Chem
from mordred import Calculator, descriptors
mols = [Chem.MolFromSmiles(smi) for smi in data['canonical_smiles']]
df = calc.pandas(mols)
data: canon_smiles.txt
@plkx, Sorry for not answering earlier... In my case I have modified environment to use networkx=2.1.0, based on what I had read in previous issues. This was about a year ago when I installed Mordred for the first time. At the time it seemed to fix the issues I faced (probably in self test).
Last week, I got this overflow problem: ~/anaconda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:87: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Previously I also had this one: ~/anaconda/lib/python3.6/site-packages/mordred/_matrix_attributes.py:251: RuntimeWarning: invalid value encountered in double_scalars s += (eig.vec[i, eig.max] * eig.vec[j, eig.max]) * -0.5 ~/anaconda/lib/python3.6/site-packages/mordred/_matrix_attributes.py:251: RuntimeWarning: divide by zero encountered in double_scalars s += (eig.vec[i, eig.max] eig.vec[j, eig.max]) ** -0.5
I hope this can help. Let me know if you want me to run a few test. Sadly I can't communicate the structures I am studying. I have run the list from @batmanscode and I do get some overflow too (see chosen pieces attached).
I am calculating the Mordred descriptors for a subset of 10k or so of HMDB. I get the following errors:
" RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwards) "
Interestingly, the number of errors increases over time as I run through the set. I am not sure if this is some issue with the system, or perhaps a bias as you go through the hmdb subset.
I am happy to send over the hmdb id's and structures as pickle if you are interested.
Thanks so much! Super happy with everything else so far...
-- I am running very simple code for this operation per your tutorial:
calc = Calculator(descriptors) df = calc.pandas('mol')
--
channels:
--
OS/distribution
Mac OSX 10.15 Catalina
conda or pip
conda
python version
Python 3.6.8 :: Anaconda, Inc.
library version
rdkit 2019.03.4.0 py36h65625ec_1 rdkit