kieferk / pymssa

Python implementation of Multivariate Singular Spectrum Analysis (MSSA)
MIT License
152 stars 48 forks source link

ERROR import pymssa #7

Open fspaolo opened 5 years ago

fspaolo commented 5 years ago

Installing thought python setup.py install went just fine. The very fist attempt to import the module, however, gave me the following error:

$ python -c 'import pymssa'

\Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/paolofer/anaconda2/lib/python2.7/site-packages/pymssa-0.1.0-py2.7.egg/pymssa/__init__.py", line 1, in <module>
    from .mssa import MSSA
  File "/Users/paolofer/anaconda2/lib/python2.7/site-packages/pymssa-0.1.0-py2.7.egg/pymssa/mssa.py", line 229
    U = left_singular_vectors @ T
                              ^
SyntaxError: invalid syntax
EricKenjiLee commented 5 years ago

This is because the @ operator for matrix multiplication was only introduced in Python 3.5 PEP465. You're probably not using a recent enough version of Python.

fspaolo commented 5 years ago

No, I need to stick to 2.7+, which is stable with the amount of legacy code I have. So I'm afraid this package is not an option then.

EricKenjiLee commented 5 years ago

The package isn't very large so I'd suggest forking the package and changing any @'s to np.matmul(). Not sure if there's a safe way to have multiple versions of Python playing nice together

fspaolo commented 5 years ago

I can try replacing the @'s and see if that's the only issue. For multiple versions of Python I would have to modify working code to fit the Py3 standards. I'll post an update. Thanks

fspaolo commented 5 years ago

Yes, there are other issues related to Py2+ vs Py3+... when I replace the @'s other errors come up:

ImportError                               Traceback (most recent call last)
<ipython-input-2-73632525add2> in <module>()
----> 1 import pymssa

/Users/paolofer/anaconda2/lib/python2.7/site-packages/pymssa-0.1.0-py2.7.egg/pymssa/__init__.py in <module>()
----> 1 from .mssa import MSSA

/Users/paolofer/anaconda2/lib/python2.7/site-packages/pymssa-0.1.0-py2.7.egg/pymssa/mssa.py in <module>()
      7 from scipy.linalg import hankel
      8
----> 9 from functools import partial, lru_cache, reduce
     10 from tqdm.autonotebook import tqdm
     11

ImportError: cannot import name lru_cache

I can fix that as well by using (for Py2+):

from backports.functools_lru_cache import lru_cache

But probably there will be other dependencies/errors showing up...

fspaolo commented 5 years ago

I managed to replace the Py2+ dependencies and run the example Notebook (you have to provide the wine.csv file, even after finding it online I had to manually modified the header).

Tomorrow I'll test is on a real use case... forecasting applied to a data cube with (t, x, y) = (1452, 1836, 104)

Hopefully it can handle it?

fspaolo commented 5 years ago

BTW, a simple

try:
    a @ b
except:
    np.matmul(a, b)

try:
    from functools import lru_cache
except:
    from backports.functools_lru_cache import lru_cache

would ensure backward compatibility.

fspaolo commented 5 years ago

This package can't handle such data sets...

Limiting n_components=4 and window_size=25 (out of >300 obs) gives me:

('Trajectory matrix shape:', (1572000, 282))
Decomposing trajectory covariance matrix with SVD
Killed: 9

Reducing the data set to ~1/4, still gives me

('Trajectory matrix shape:', (393000, 282))
Decomposing trajectory covariance matrix with SVD
Killed: 9

These are likely memory-related issues. I also noticed that you use np.dot() for the SVD, which is not very efficient.

This is a typical use case for EOF analysis, it should be doable with MSSA.

EricKenjiLee commented 5 years ago

I’d like to first point out this isn’t my package :)

So try-except is useful but it depends on the approach to coding. Sometimes it’s better to have things “fail loudly” so in this case I don’t mind that the author hasn’t written this in; generally, Python2 isn’t supported anymore and all code should be ported over if possible. Also, if you want to write try-excepts, better to have a specific exception in mind like “except ValueError()” so you catch only a very specific failure mode.

EricKenjiLee commented 5 years ago

It also looks like you’re not going to be able to do computations with that large of a matrix since you’re probably running out of memory. Consider doing sparse matrix operations if possible or I know there are some other packages for out of memory operations. I think PyTables is made for large amounts of data