iglesias / tapkee_benchmarks

Benchmarks and comparisons for the Tapkee library
http://iglesias.github.io/tapkee_benchmarks
3 stars 2 forks source link

Neighborhood Graph is unconnected #1

Closed jejjohnson closed 9 years ago

jejjohnson commented 9 years ago

Overview

Hello!

So I was trying to use your dimension reduction pack in relation to hyperspectral imagery as you cited as an application within your benchmarks. So I cloned your repository and ran your script to perform the benchmark and I didn't run into any problems. However, I downloaded the library and wanted to mess around with dimension reduction for hyperspectral imagery and I get this glaring error no matter which similar hyperspectral data set I used - [warning] The neighborhood graph is not connected. This leads my results to having nan values. This is consistent between the tapkee CLI inputs as well as the shogun-toolbox inputs.

Specific Issue

So let's say I run this command in the CLI:

_./tapkee -i iaviris.dat -o avirisdimred.dat -m isomap -td 20 -k 25 --benchmark

The method will run fine except for the [warning] The neighborhood graph is not connected. error that pops up no matter what simple parameters I use, i.e. k-nearest neighbors, target dimension, eigensolver method. Upon closer inspection, I see that the output file is simply a list of nan for each column.

So I tried to vary the dimension reduction techniques, i.e. pca, laplacian eigenmaps, neighborhood preserving projection, local linear embedding, etc, and I found that the eigendecomposition would always fail as an error would pop up saying Some error occured: eigendecomposition failed. Now I varied the knn, the dimension and the methods and I found that the only methods that would produce a solution with actual values in the output file were the stochastic methods; e.g. t-stochastic neighbourhood embedding and stochastic proximity embedding; and MDS method. Sometimes my solution would be nonsense - like if the input data resides between 0 and 1, the output data should as well and sometimes these algorithms didn't produce that - but that is to be expected with stochastic methods. And the MDS will put out some nan values as well but not all of them.

I also tried to use my own hyperspectral data set albeit it was much bigger; 145x145 with 200 dimensions. I created a flattened image of it so that the dimensions were 21025x200. However, I still got the same errors except it just took longer to process.

I even get the same issue if I try to use the Shogun-Toolbox to try and enter the same data sets via Python. Same error except it will not even produce a result and stops the algorithm altogether.

Question

So, what did you do for your graph that produced actual results? I looked into your script and I didn't find any extra commands that would allow you overcome this error. Maybe I am missing something in your script for using Tapkee on the aviris dataset? What commands should I enter or vary so that I can get a sensible solution with your dataset and package? Is there some sort of preprocessing step that you (or I could) do to avoid this issue?

Thank you for your time!

jejjohnson commented 9 years ago

Refer to this link for more information regarding the issue as well as the authors' comments on it.