lferry007 / LargeVis

Apache License 2.0
706 stars 167 forks source link

Lonely points #2

Open shanishalgi opened 8 years ago

shanishalgi commented 8 years ago

Hi, I'm trying to use LargeVis to visualize my doc2vec features of 20NG (7532 test documents, 100 features each). I'm using all the default parameters, and I get the following result. largevis20ng I was surprised by he lonely points in the data as their corresponding documents were not noticeably different than the others in their category. I tried running the algorithms after taking these documents out of the dataset, but got a similar pattern of results - a few ~6-9 lonely points representing seemingly normal documents. I previously modeled this data using various TSNE methods and none showed such a pattern of results. I am wondering if there is a simple explanation or something I am overlooking?

Also, plot.py only works for me if I change in row 29: vec[1], vec[2] to vec[0], vec[1]

Thanks in advance Shani

lferry007 commented 8 years ago

Hi,

I have fixed the bug in plot.py. Please try it!

Your visualization indeed looks odd. Could you please share with us the feature vectors you were using? Thanks!

shanishalgi commented 8 years ago

Attached is my feature vector file (and their labels) ng20test_features.txt ng20test_labels.txt

lferry007 commented 8 years ago

Hi,

Using the feature vectors you shared and default parameter settings, we get the following visualization. ng

Since your data set is relatively small, we try to decrease the parameter "-neigh" to 50 and the parameter "-perp" to 20, and get visulization like this: ng_neigh50_perp20

It seems to look better. (We don't try other settings further.)

shanishalgi commented 8 years ago

Hi, Has the code been changed since I uploaded my data? Because I used the default settings and got a very different result.

On Wed, Sep 14, 2016 at 5:17 PM, lferry007 notifications@github.com wrote:

Hi,

Using the feature vectors you shared and default parameter settings, we get the following visualization. [image: ng] https://cloud.githubusercontent.com/assets/15796471/18515282/63320bb2-7ac7-11e6-8335-8d4a86925c11.png

Since your data set is relatively small, we try to decrease the parameter "-neigh" to 50 and the parameter "-perp" to 20, and get visulization like this: [image: ng_neigh50_perp20] https://cloud.githubusercontent.com/assets/15796471/18515328/924aae5e-7ac7-11e6-999f-549cfd2f75ce.png

It seems to look better. (We don't try other settings further.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lferry007/LargeVis/issues/2#issuecomment-247027430, or mute the thread https://github.com/notifications/unsubscribe-auth/AK99YzYjoOR-mxMmDSnVMi__A-7aEM-2ks5qqAHngaJpZM4JOiKR .

tangjianpku commented 8 years ago

We've updated the code and you can have a try. If there is still a problem, it may be the problem of system configuration.

Thanks, Jian https://sites.google.com/site/pkujiantang/home