lferry007 / LargeVis

Apache License 2.0
708 stars 168 forks source link

Speed Improvements #30

Open blutooth opened 7 years ago

blutooth commented 7 years ago

Hey guys, Thanks for the great implementation, has really helped visualise data that I'm working on. However, I can see two ways in which the implementation could be improved. (1) NMSLib can compute K-NN trees 10 times as quickly as Annoy, and allows for 10 times as many queries per second. (2) When computing the objective of the model, you could use a GPU library like pytorch and batch compute. This might speed up the calculations by a big factor if you can allow for large batches.

I'd be willing to work on this as a project, if you guys are up for it. I'm not sure about the optimisation tricks you've used in (2) training the objective, so would be less likely to try to implement by myself.

Thanks, Max

buaawht commented 6 years ago

Hi, there is a issue puzzled me. You know that in LargeVis -fea means specify whether the input file is high-dimensional feature vectors (1) or networks (0). Default is 1. I have a file of feature vectors, and when i use LargeVis i can get a set of 2-D vectors. I want to know the node-sequence of original feature is in keeping with the generated 2-D vectors or not. I try to read the source code, but I cannot get the answer. Thank u.

blutooth commented 6 years ago

Yes, it is in keeping.

bigheiniu commented 6 years ago

Have you ever guys implemented this based on spark platform? I would like to implement largevis in a distributed systems to fit the large amount of data.

tangjianpku commented 6 years ago

No, but it would be interesting to see it.

Jian

On Tue, Jan 16, 2018 at 9:19 PM, bigheiniu notifications@github.com wrote:

Have you ever guys implemented this based on spark platform? I would like to implement largevis in a distributed systems to fit the large amount of data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lferry007/LargeVis/issues/30#issuecomment-358173487, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2ueoOF8MSJXr4h-HwCoM-JhqIr-8Ltks5tLVibgaJpZM4Oxyps .

bigheiniu commented 6 years ago

@blutooth In the second suggestion, do you mean use batch gradient relied on GPU rather than sgd in previous code to speed up? And could you give some hints or reading materials about the second advice? Thanks