Closed aminaaslam closed 7 years ago
Can you share your data? I will debug it. BTW, did you test this with 1.3.0? Have you tried 1.2.3? I made big changes to matrix computation. Want to make sure if the changes cause this. Thanks!
HI Hai, I am using smile version 1.2.3. Do you think i should try 1.3.0??
I will test it with 1.3.0 (latest version) anyway.
ok I will test it with the latest version and let you know. until then i will keep the issue opne.
Thanks
I guest that this is the same data as in ticket 174. Can you first check if you have duplicated samples in your data? Thanks!
hi hai, you guessed it right. there may be duplicate samples in the data. Is duplicate data instances causing the problem? Does this mean i need to remove duplicate data instances from data?? or version 1.3.0 works around this problem.
Duplicated sample will cause the distance matrix singular, which cause the issue in ticket 174 for sure. You should remove duplicated samples. no work around for singular matrix.
This ticket might be cause by the duplicated samples. But I am not sure. Thanks!
Let me remove the duplicated samples and see what happens. I will revert back. Thanks
Cards data. none:works fine
Standardize: Gives this error
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 402
at smile.manifold.LLE.
Normalize: Gives this error
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 402
at smile.manifold.LLE.
"method" : "isomap.learner", "parameters" : { "d" : 2, "k" : 5, "normMethod" : "normalize", }
"method" : "isomap.learner", "parameters" : { "d" : 2, "k" : 5, "normMethod" : "standardize", }
"method" : "isomap.learner", "parameters" : { "d" : 2, "k" : 5, "normMethod" : "none", }
Does "d" means the dimensionality of input data? If so, k = 5 is probably too big.
Also, normalization and standardization may not be good ideas for manifold learning. They are mostly for classification.
yes d means dimensions of data. i can have only two values in it 2 or 3. So what would be a good range for k for these dimensions???
In general k should be less than d. The purpose of manifold learning is to find the intrinsic dimensions, which should be smaller.
Hi Hai, Referring to your earlier comment k should be less than d. Then how do i run manifold learning on Mnist dataset and get these results. http://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html#sphx-glr-auto-examples-manifold-plot-lle-digits-py Here in the experiment k = 30 and number of dimensions =2 ??
The dimension of MNIST is 28 X 28 = 784. You are confused with the t-SNE plot.
This is one of the examples in the link n_neighbors = 30
print("Computing Isomap embedding")
t0 = time()
X_iso = manifold.Isomap(n_neighbors, n_components=2).fit_transform(X)
print("Done.")
plot_embedding(X_iso,
"Isomap projection of the digits (time %.2fs)" %
(time() - t0))
Sorry, there were miscommunications. I was asking if d is the input dimension in your settings. You said yes. In our API, d is the output dimension.
I am sorry for the miscommunication. This means i can set k greater the output dimensions of the data. So when i do that it gives me this error. Is it because of duplicate data samples? But when i set k <d(output of dimensions) this error disappears? can you please explain what is going on?. Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 402 at smile.manifold.LLE.(LLE.java:209) at com.smile.dimensionality.reduction.LLELearner.learn(LLELearner.java:67) at com.smile.dimensionality.reduction.ManifoldLearningFunction.execute(ManifoldLearningFunction.java:85)
Duplicates are more likely the issue.
Hi Hai, So i made sure there are no duplicates in my data but when i give these parameters it gives me this exception d = 2(dimensions of output data ) k =3
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See for further details.
Exception in thread "main" java.lang.RuntimeException: Matrix is singular.
at smile.math.matrix.LUDecomposition.solve(LUDecomposition.java:254)
at smile.manifold.LLE.
Can you share the data? I will debug it. Sometimes if two samples are too close, the distance matrix might be singular or near singular, which will cause the problem. Thanks!
Here is the data that i am using and the data description file.
.json is the data descriptor the other file is data.
Thanks,
Thanks! Do you have the code snippet too?
For parsing the data ??
For parsing and also the call to LLE. Thanks!
Hai, I am using univocity parser for parsing the data. import com.univocity.parsers.csv.CsvParser; import com.univocity.parsers.csv.CsvParserSettings;
I dont know how to share the code with you because its so interdependent that i will have to share the entire project with you and thats a total waste of your time. This is the best i could do
double data[][]; IsoMap isomap = new IsoMap(data, d, k,true); this.coordinates = isomap.getCoordinates(); this.graph = isomap.getNearestNeighborGraph();
Thanks! What's your k and d? You already filtered the duplicates in the attached files, right?
k = k-neighbor d= dimensions of output data Yes there are no duplicates in the data. Hai, one more thing i actually ran Iso map on a data set with duplicate samples and it worked fine. Its just that there were only 100 records in there. Is size of data with higher number of k the cause of this exception?
Large k is not recommended in general. I know the meaning of k and d :) I was asking their values in your settings that cause the problem.
Sorry , This is the value that i used "d" : 2, "k" : 3,
Can you serialize the parsed data (the data matrix) into a plain csv file or Java object file? I am afraid that I will load your data incorrectly, which is pretty complicated. BTW, IsoMap/LLE uses Euclidean distance, which seem not appropriate to your data, which is mix of numeric, nominal and string data.
You better first convert nominal values to one-hot encoding.
Also don't include operation id and timestamp in the features. If you have to use timestamp, better convert it to things like day of month, day of weeks, etc. Feature engineering is very important in machine learning.
Here is the parsed data with no Strings and nominal features( i have done one-hot encoding). Let me know if this is what u wanted. Please find attached the .csv file and this data goes into the ISO-Map learner. outdata.csv.gz.zip
I am not using any String features and i am converting nominal features. Please let me know if this makes sense.
Thanks! Sounds good. I will try it tonight.
this is .gz file. You should be able to open this. THanks!!!
Hi Hai, Did u get a chance to look at the file that i sent you. Thanks!!!!
Got OOM error on a small machine last night. Will try it on a bigger machine.
Hi Hai,
I have a good news so i have a smaller data set which has 3000 records and ISOMap works on as big a value of k=50. I am not able to test this data set that i attached here because i am running into OOM even on my biggest machine and with k=3.
However, LLE doesnot work even with k=3 on my smaller data set and throws the exception that i mentioned in ticket: 174 and here is the exception
Exception in thread "main" java.lang.RuntimeException: Matrix is singular.
at smile.math.matrix.LUDecomposition.solve(LUDecomposition.java:254)
at smile.manifold.LLE.
I see the exception in IsoMap, which happens during the eigen value decomposition. I will try to figure out what's wrong. It may take some time.
Thanks that will be very helpful. Amina
Hi Hai, Did u get a chance to look at what is going one with Eigen Value Decomposition. Please let me know when you get a chance to look at this issue. Thanks, Amina
Likely it is a numeric stability issue. It is also why you have fewer problems with smaller data. For long term, we should use something like blas and lapack, which are more numerical stable and also faster. I am looking into how to do it. But it will take a lot of time.
Hi Hai, I am running into this issue while running ISOMap
[main] INFO smile.manifold.IsoMap - IsoMap: 2 connected components, largest one has 986 samples. Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10 at smile.math.matrix.EigenValueDecomposition.tql2(EigenValueDecomposition.java:1404) at smile.math.matrix.EigenValueDecomposition.decompose(EigenValueDecomposition.java:629) at smile.math.matrix.EigenValueDecomposition.decompose(EigenValueDecomposition.java:422) at smile.math.Math.eigen(Math.java:4316) at smile.manifold.IsoMap.<init>(IsoMap.java:179) at com.smile.dimensionality.reduction.IsoMapLearner.learn(IsoMapLearner.java:80) at com.smile.dimensionality.reduction.ManifoldLearningFunction.execute(ManifoldLearningFunction.java:85) at com.common.ModelingEngine.main(ModelingEngine.java:81)