univie-datamining-team3 / assignment2

Analysis of mobility data
MIT License
0 stars 0 forks source link

DTW: ValueError:Improper number of dimensions to norm. #32

Closed Lumik7 closed 6 years ago

Lumik7 commented 6 years ago

@rmitsch I get an exception when running dynamic time warping implemented in #21.

When running python make_dataset.py --download False --preprocessing True and setting in make_dataset.py the code to:

        dfs = Preprocessor.preprocess(tokens,
                                      filename="preprocessed_data.dat",
                                      distance_metric="dtw",
                                      use_individual_columns=False)

I get the following error message for thread 1 - 6:

2018-01-03 14:40:48,096 - __main__ - INFO - start preprocessing data:
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\threading.py", line 914, in _bootstrap_inner
    self.run()
  File "C:\Users\Lukas\Dropbox\WS2017_18\Gruppenarbeiten\DM\2.GroupExercise\assignment2\src\data\DTWThread.py", line 59, in run
    dist=self.norm
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 53, in fastdtw
    return __fastdtw(x, y, radius, dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 73, in __fastdtw
    __fastdtw(x_shrinked, y_shrinked, radius=radius, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 68, in __fastdtw
    return dtw(x, y, dist=dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 130, in dtw
    return __dtw(x, y, None, dist)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 141, in __dtw
    dt = dist(x[i-1], y[j-1])
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\fastdtw\fastdtw.py", line 61, in <lambda>
    return lambda a, b: np.linalg.norm(a - b, p)
  File "C:\Users\Lukas\.conda\envs\Tensorflow\lib\site-packages\numpy\linalg\linalg.py", line 2257, in norm
    raise ValueError("Improper number of dimensions to norm.")
ValueError: Improper number of dimensions to norm.

Which results in the ValueError:Improper number of dimensions to norm.

Lumik7 commented 6 years ago

@MoBran encountered the same issue

Lumik7 commented 6 years ago

Update: Tried to run it with one thread, but get the same error as before My Versions of the required packages are the following:

python 3.5.4
Cython 0.27.3
fastdtw 0.3.2
numpy 1.13.3 (via pip)
numpy 1.13.1 (via conda)
psutil 5.4.1

I'm running Windows 10. I tried installing it with Cython and without it.

Update: I just saw that I have two numpy versions installed in my environment. numpy 1.13.1 was installed via conda and is a dependency for pandas, sklearn, ... and numpy 1.13.3 was installed via pip when installing fastdtw.

Update2: I used only the numpy version 1.13.1 and still got the same error.

Lumik7 commented 6 years ago

By putting some print statements into the numpy.linalg.norm function I found that the value that is handed to the function is an scalar ndarray and has therefore no axis argument which raises the ValueError("Improper number of dimensions to norm.") in numpy.linalg.norm.

rmitsch commented 6 years ago

Tested against the latest version on master and with all currently available data, still can't reproduce. Next step (will do today): Will compare versions of installed modules.

Lumik7 commented 6 years ago

I created a fresh conda environment with python 3.5.4 and installed all requirements again, but it still fails. These are all packages:

alabaster==0.7.10
asn1crypto==0.22.0
awscli==1.14.18
Babel==2.5.1
backports.functools-lru-cache==1.4
botocore==1.8.22
certifi==2017.11.5
cffi==1.11.2
chardet==3.0.4
click==6.7
colorama==0.3.9
coverage==4.4.2
cryptography==2.1.4
cycler==0.10.0
Cython==0.27.3
decorator==4.1.2
docutils==0.14
fastdtw==0.3.2
flake8==3.5.0
future==0.16.0
gmplot==1.2.0
idna==2.6
imagesize==0.7.1
Jinja2==2.10
jmespath==0.9.3
lxml==4.1.1
MarkupSafe==1.0
matplotlib==2.1.1
mccabe==0.6.1
numpy==1.13.1
pandas==0.22.0
patsy==0.4.1
plotly==2.2.3
psutil==5.4.2
pyasn1==0.4.2
pycodestyle==2.3.1
pycparser==2.18
pyflakes==1.6.0
Pygments==2.2.0
pyOpenSSL==17.2.0
pyparsing==2.2.0
PySocks==1.6.7
python-dateutil==2.6.1
python-dotenv==0.7.1
pyts==0.5
pytz==2017.3
PyYAML==3.12
requests==2.18.4
rsa==3.4.2
s3transfer==0.1.12
scikit-learn==0.19.0
scipy==0.19.1
seaborn==0.8.1
six==1.11.0
sklearn==0.0
snowballstemmer==1.2.1
sobol-seq==0.1.2
Sphinx==1.6.5
sphinxcontrib-websupport==1.0.1
statsmodels==0.8.0
tornado==4.5.2
urllib3==1.22
win-inet-pton==1.0.1
wincertstore==0.2

If the bug is because of the requirements we may have to use distutils to make sure we have the same setup or we use docker.

rmitsch commented 6 years ago

@Lumik7 @MoBran I ran the latest version on master within a virtual environment configured with exactly those dependencies (pip install -r ...). Still couldn't reproduce the bug. Could you try to run it with this configuration on your machines?

alabaster==0.7.10
awscli==1.14.2
Babel==2.5.1
bayesian-optimization==0.6.0
bleach==1.5.0
botocore==1.8.6
certifi==2017.11.5
chardet==3.0.4
click==6.7
colorama==0.3.7
colorlover==0.2.1
coranking==0.1.1
coverage==4.4.2
cycler==0.10.0
Cython==0.27.3
decorator==4.1.2
docutils==0.14
entrypoints==0.2.3
enum34==1.1.6
fastdtw==0.3.2
flake8==3.5.0
future==0.16.0
gmplot==1.2.0
hdbscan==0.8.11
html5lib==0.9999999
idna==2.6
imagesize==0.7.1
ipykernel==4.6.1
ipython==6.2.1
ipython-genutils==0.2.0
ipywidgets==7.0.3
jedi==0.11.0
Jinja2==2.9.6
jmespath==0.9.3
jsonschema==2.6.0
jupyter-client==5.1.0
jupyter-console==5.2.0
jupyter-core==4.3.0
lxml==4.1.1
Markdown==2.6.9
MarkupSafe==1.0
matplotlib==2.1.0
mccabe==0.6.1
mistune==0.7.4
nbconvert==5.3.1
nbformat==4.4.0
nose==1.3.7
notebook==5.2.0
numpy==1.13.3
pandas==0.21.0
pandocfilters==1.4.2
parso==0.1.0
pexpect==4.2.1
pickleshare==0.7.4
pkg-resources==0.0.0
plotly==2.2.3
prompt-toolkit==1.0.15
protobuf==3.5.0.post1
psutil==5.4.2
ptyprocess==0.5.2
py4j==0.10.4
pyasn1==0.4.2
pycodestyle==2.3.1
pyflakes==1.6.0
Pygments==2.2.0
pyparsing==2.2.0
pyspark==2.2.0
python-dateutil==2.6.1
python-dotenv==0.7.1
pyts==0.5
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.2
qtconsole==4.3.1
requests==2.18.4
rsa==3.4.2
s3transfer==0.1.12
scikit-learn==0.19.0
scipy==0.19.1
seaborn==0.8.1
simplegeneric==0.8.1
six==1.11.0
sklearn==0.0
snowballstemmer==1.2.1
sobol-seq==0.1.2
Sphinx==1.6.5
sphinxcontrib-websupport==1.0.1
tensorflow==1.4.0
tensorflow-tensorboard==0.4.0rc3
terminado==0.6
testpath==0.3.1
tornado==4.5.2
traitlets==4.3.2
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.12.2
widgetsnbextension==3.0.6
rmitsch commented 6 years ago

Alternative in case of persisting problems: pip install dtw

rmitsch commented 6 years ago

@Lumik7 @MoBran Please check the latest version in master and try to reproduce the error. I added a reshape of the vectors before passing them on to DTW that hopefully mitigates the problem.

Lumik7 commented 6 years ago

I'm running it right now, how long does it take for all tokens?

rmitsch commented 6 years ago

On my laptop around 20 to 30 minutes (with multithreading), if I recall correctly. I'll check now.

Lumik7 commented 6 years ago

Ok, it just finished, but it went way to fast, as the preprocessing steps alone take normally 10 minutes. I got a very small distance matrix as a result:

grafik

rmitsch commented 6 years ago

I noticed i forgot to change tokens = [os.environ.get(alias) for alias in ["KEY_RAPHAEL"]] #, "KEY_MORITZ", "KEY_LUKAS"]] back to tokens = [os.environ.get(alias) for alias in ["KEY_RAPHAEL", "KEY_MORITZ", "KEY_LUKAS"]] after testing before I pushed to master. Maybe that's still the case in the version you're running?

EDIT: Sorry, found it. Forgot to remove a code snippet for testing that only kept the first trip. I'll remove that and push again.

rmitsch commented 6 years ago

Version calculating all trips is on master now. Please check again.

Lumik7 commented 6 years ago

Yeah I changed that before running it.

I also noticed that you put [:1] here:

           # 1. Get travel data per token, remove dataframes without annotations.
            dfs = Preprocessor.replace_none_values_with_empty_dataframes(
                # Drop dataframes w/o annotations.
                Preprocessor._remove_dataframes_without_annotation(
                    # Get travel data per token.
                    Preprocessor.get_data_per_token(token)
                )
            )[:1]

I'm now running it with the previous version

rmitsch commented 6 years ago

Yeah, the [:1] and the definition of tokens are removed. Should process the entire dataset now. Sorry for not thinking of that.

Lumik7 commented 6 years ago

Alright, but it ran through with the small set, so I guess thats a good sign :)

rmitsch commented 6 years ago

I guess so. It's weird that my version of the installed modules accepted the "malformed" vectors and your's didn't. But whatever, it seems to be solved now. I'll close this issue - please open it again if anything related comes up. I'll work a bit on the documentation next.