mordred-descriptor / mordred

a molecular descriptor calculator
http://mordred-descriptor.github.io/documentation/master/
BSD 3-Clause "New" or "Revised" License
355 stars 95 forks source link

Error in self tests #80

Open breisfeld opened 4 years ago

breisfeld commented 4 years ago

Hi,

Running

$ python -m mordred.tests

yields an error:

======================================================================
FAIL: mordred.tests.test_mordred_main.test_sdf_3D
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/XXXXX/local/python/anaconda3/envs/mordred-env/lib/python3.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/XXXXX/mordred/mordred/tests/test_mordred_main.py", line 176, in test_sdf_3D
    eq_(stderr, "")
AssertionError: "[ERROR] Benzene: module 'networkx' has no attribute 'biconnected_component_subgraphs' (SpAbs_Dt/SpAbs/mordred._matrix_attributes.Eigen(mordred.DetourMatrix.DetourMatrixCache())/mordred.DetourMatrix.DetourMatrixCache())\n" != ''

I believe the same error will result when computing some of the 3D descriptors.

It seems as if the mordred code is using functionality that has been deprecated in networkx.

environment

OS/distribution

Mac OS X

conda or pip

conda

python version

Python 3.7.4

library version

_libgcc_mutex 0.1 main
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2019.10.16 0
cairo 1.14.12 h8948797_3
certifi 2019.9.11 py37_0
decorator 4.4.0 py37_1
fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_1
glib 2.56.2 hd408876_0
icu 58.2 h9c2bf20_1
intel-openmp 2019.4 243
jpeg 9b h024ee3a_2
libboost 1.67.0 h46d08c1_4
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.0.10 h2733197_2
libuuid 1.0.3 h1bed415_2
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
mkl 2019.4 243
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.14 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
mordred 1.2.0 pypi_0 pypi ncurses 6.1 he6710b0_1
networkx 2.4 py_0
nose 1.3.7 py37_2
numpy 1.17.2 py37haad9e8e_0
numpy-base 1.17.2 py37hde5b4d6_0
olefile 0.46 py37_0
openssl 1.1.1d h7b6447c_3
pandas 0.25.2 py37he6710b0_0
pcre 8.43 he6710b0_0
pillow 6.2.0 py37h34e0f95_0
pip 19.3.1 py37_0
pixman 0.38.0 h7b6447c_0
py-boost 1.67.0 py37h04863e7_4
python 3.7.4 h265db76_1
python-dateutil 2.8.0 py37_0
pytz 2019.3 py_0
rdkit 2019.09.1.0 py37hc20afe1_1 rdkit readline 7.0 h7b6447c_5
setuptools 41.4.0 py37_0
six 1.12.0 py37_0
sqlite 3.30.1 h7b6447c_0
tk 8.6.8 hbc83047_0
tqdm 4.36.1 py_0
wheel 0.33.6 py37_0
xz 5.2.4 h14c3975_4
yaml 0.1.7 had09818_2
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0

ardejani commented 4 years ago

I encountered the same issue. Downgrading networkx to 2.3 "solves" this test error but there are still some Python3 compatibility issues like these:

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated yield ok_, np.isnan(f.ix[2, 0]) ./Users/maz/miniconda2/envs/python37/lib/python3.7/site-packages/mordred/tests/test_pandas.py:27: FutureWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

See the documentation here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated yield eq_, f.ix[2, 1], 4 .[10:32:57] Warning: molecule is tagged as 3D, but all Z coords are zero [10:32:57] Warning: molecule is tagged as 3D, but all Z coords are zero [10:32:57] Warning: molecule is tagged as 3D, but all Z coords are zero [10:32:57] Warning: molecule is tagged as 3D, but all Z coords are zero

porteusconf commented 4 years ago

Good advice to downgrade networkx. Also, Issue #80 was closed with a commit to pinning networkx to 2.1, but maybe 2.3 is ok. At least for me 2.4 had 65 errors but 2.3 only 1 error (transcript below). So maybe $80 should/could pin 2.3, not 2.1? And #80 might consider that i had to do conda install nose pyyaml as well. Anyways I'm down to just 1 error now. (I'm using linux mint 18.1 = ubuntu).

### after conda install networkx=2.4 tests failed  with 65 errors, 5 failures (below...)
(base) barb@barb-Aspire-V5-122P /home/opt/miniconda3 $ conda install networkx=2.4
(base) barb@barb-Aspire-V5-122P /home/opt/miniconda3 $ python -m mordred.tests
Ran 69636 tests in 438.249s
FAILED (errors=65, failures=5)
(base) barb@barb-Aspire-V5-122P /home/opt/miniconda3 $ conda install networkx=2.3
###
 $ conda install networkx=2.3
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##
  environment location: /home/opt/miniconda3
  added / updated specs:   - networkx=2.3
The following packages will be DOWNGRADED:
  networkx                                         2.4-py_0 --> 2.3-py_0
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) barb@barb-Aspire-V5-122P /home/opt/miniconda3 $ python -m mordred.tests
.....................................
.......................................
======================================================================
ERROR: Failure: AttributeError ('MordredDataFrame' object has no attribute 'ix')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/opt/miniconda3/lib/python3.7/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/opt/miniconda3/lib/python3.7/site-packages/nose/loader.py", line 251, in generate
    for test in g():
  File "/home/opt/miniconda3/lib/python3.7/site-packages/mordred/tests/test_pandas.py", line 22, in test_fill_missing
    yield eq_, f.ix[0, 0], 1
  File "/home/opt/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'MordredDataFrame' object has no attribute 'ix'
----------------------------------------------------------------------
Ran 69636 tests in 376.932s
FAILED (errors=1)

After conda install networkx=2.3 tests failed with only 1 error. I had the same error after I downgraded to 2.1 and even 2.0...

The following packages will be SUPERSEDED by a higher-priority channel:

  networkx              pkgs/main/noarch::networkx-2.3-py_0 --> pkgs/main/linux-64::networkx-2.1-py37_0

The real fix, instead of these work-arounds, will come by fixing /mordred/tests/test_pandas.py to use newer networkx, but I see no issue on that... Haven't looked thru forks yet. Will update here if I find anything,

breisfeld commented 4 years ago

Actually, I 'solved' my original problem by adding to mordred a separate compatibility module that contains a version of biconnected_component_subgraphs from an old version of networkx.

plkx commented 3 years ago

I'll add myself as another case with this issue. I've gotten by with the networkx downgrade approach, thanks to this thread. I am wondering about "adding to mordred a separate compatibility module that contains a version of . . . . What exactly does that involve, and does it persist through networkx upgrades?

breisfeld commented 3 years ago

Unfortunately, I don't have the code handy at the moment.

Essentially, I downloaded a version of networkx that still had the biconnected_component_subgraphs function. I then copied the function to a separate module, ensuring the required networkx imports were included. I then added this module to my version of mordred and made sure that I included my module in mordred's imports.

If the developers haven't disabled the API required by biconnected_component_subgraphs, this method should persist through networkx upgrades. However, upgrades of mordred might be an issue.

plkx commented 3 years ago

Thanks - that makes sense (once I've heard the answer).

I decided to stick with networkx 2.3, and edited t_pandas.py in \mordred\tests.

Through testing, I found that pandas through 0.25 generated no errors (since ix was still functional), through version 2.3 of networkx.

The pandas error is caused by data selection using .ix, which was deprecated in pandas 1.0. In the few relevant lines where f.ix is used, a single value is selected. My Python-novice perception suggests that replacing f.ix with f.iat should suffice, which apparently is accurate.

After that simple find-replace (having backed up the original file, of course), all tests complete without error (and a bit faster, it seems) using the latest release of pandas (1.1.3; mordred 1.2.0; networkx 2.3; python 3.7.9).

Seems like such extremely simple fix for the problem, which makes me wonder why it hasn't been addressed already. Maybe because it only affects the tests. If .ix was used elsewhere, error messages should be generated and descriptor generation should fail (which it does not).

I am thankful for this message thread - I'm not sure if I would have persisted long enough to resolve the issues before I abandoned mordred.

Best Regards,

plkx

breisfeld commented 3 years ago

Yes, I made the same changes with regard to pandas indexing. It might be worth creating a pull request with these changes (+ the networkx workaround) so that mordred can work with current versions of pandas and networkx.

plkx commented 3 years ago

Agreed.

I have spent a couple of hours perusing the networkx-related code, which resides in mordred's DetourMatrix.py, and in networkx's biconnected.py.

The function biconnected_component_subgraphs used by mordred is still in nx 2.3 (with the deprecation warning). Mordred iterates over a list of subgraphs it obtains as bcc from the biconnected_components function (in a separate definition).

biconnected_component_subgraph's recommended replacement seems to use the G.subgraph function to iterate directly over the list returned from the biconnected_components(G) function.

The deprecation warning says to use "(G.subgraph(c) for c in biconnected_components(G)) in place of biconnected_component_subgraphs(G).

It doesn't seem dramatically challenging, but it is a bit more involved than a simple find and replace excercise.

breisfeld commented 3 years ago

It would probably be best to adapt mordred for the current conventions and API in networkx and pandas. It appears that you have delved into the API changes and necessary conversions. I took the easy way out and just created a compatibility layer. My concern in all of this is that the project seems moribund. Is there any active development? Will pull requests even be considered? Should the project be forked?

plkx commented 3 years ago

Yes - I don't want to see it fade away, either, because I've found it to be a valuable resource in an area with not a huge pool of resources (especially free), and few that are much better than mediocre. It does seem to have withered away, though.

I've tried pretty much every free descriptor calculation software I could obtain, and several commercial packages. I prefer Mordred over any of the free software I am aware of, and it's ~equal to my main commercial option.

Having self-test errors and the like right after installation probably puts most people off, so has limited Mordred's adoption. That does mean keeping relatively current with required packages. Even though Mordred can be made fully functional, as this thread has shown, I doubt many people will go to the trouble we have. Having benefited from such trouble shooting & resolution resources many times over the years, I try to give back when it seems meaningful.

One of these days, with all of the free time I'm hoping to find any second, I would consider a genuine Mordred fork. With little (actually none) code documentation in Mordred that I've seen, it's a lot more challenging for me, having spent very little time on genuine coding for almost 20 years.

For now, I will be picking my battles. I may experiment with your approach, and move the deprecated function into DetourMatrix.py, and upgrade to NetworkX 2.5. If that frees other packages being held back from upgrades due to dependencies, it may improve performance.

I've run some sets of over 1000 molecules, and when the average molecular weight of the set is more than a few hundred, the computation increases to an hour or more, on my laptop.

mkrompiec commented 3 years ago

Instead of forcing an old version of networkx, change line 121 in DetourMatrix.py from: for bcc in networkx.biconnected_component_subgraphs(self.G, False): to: for bcc in (self.G.subgraph(c) for c in networkx.biconnected_components(self.G)): The pandas issue is fixed by replacing all instances of "ix" in test/test_pandas.py by "iloc". With these two changes, all tests complete successfully.

breisfeld commented 3 years ago

Thank you for the remedy, @mkrompiec

It would be great to have this and the pandas fixes adopted into the mordred distribution.

RMusil commented 3 years ago

Thanks, plks for helping me fixing this mordred error. I think that in your otherwise very useful message is a small error: the name of the file is not "t_pandas.py in \mordred\tests", but rather "test_pandas.py". For some users this correction might help. Thank you!