Extend openTSNE to specific purposes

pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE

https://opentsne.rtfd.io

BSD 3-Clause "New" or "Revised" License

1.42k stars 157 forks source link

Extend openTSNE to specific purposes #232

Closed Yangxiaojun1230 closed 1 year ago

Yangxiaojun1230 commented 1 year ago

          I'm glad you've fixed it! I'll go ahead and close this issue for the time being then.

Do you mean you want to create an embedding on part of the points first, then add the other points into the embedding? If so, you can do that using the .transform functionality. But if you do go this route, the new points you put in with the .transform won't consider interactions between themselves, and that often isn't what you'd want.

Or do you mean you want to fix the final position of only some of the points and have all the other points be optimized w.r.t. that? That is an interesting use case, and I've often thought about implementing that, but this isn't possible in the current version of openTSNE.

Originally posted by @pavlin-policar in https://github.com/pavlin-policar/openTSNE/issues/230#issuecomment-1413881697

Yangxiaojun1230 commented 1 year ago

Hi Pavlin, Yep, I want to extend openTSNE to the second scenario which you mentioned above. My immature idea is that through input a id list and set the corresponding point coordinates in embedding space to fixed value. But I am not sure if it is feasible, since it will impact the scale of the total embedding space.

pavlin-policar commented 1 year ago

The most straightforward way to achieve this would probably be to mask out the gradients for these points, effectively fixing them in place. Most likely, you'd just need to zero out the update for the given rows here.

But you are correct in that you'd now be changing the scale of the embedding which can be problematic. For instance, if you use an initialization that isn't rescaled to something tiny, (hence the reason for our initialization.rescale function, the optimization doesn't work. The span (x_max -x_min) tends to increase during optimization. You'd likely need to tinker with the optimization parameters to get this to work properly.

I'm interested in what your use case is here? The use-case I was thinking of previously was to allow user intervention and allow the user to steer the embedding, but I've never really been convinced of the practical uses of that.

Yangxiaojun1230 commented 1 year ago

The use-case I was thinking of previously was to allow user intervention and allow the user to steer the embedding, but I've never really been convinced of the practical uses of that

Hi Pavlin, Thanks for your advice. I will try on my case and see the results. My case is layout planning in briefly, for instance, we have million of nodes(machines) need to be placed in a specified area, each node has connectivities with other nodes and has self-weight. We will encode these nodes into a high dimension space through a neural network , and then use dimension reduction method to a 2D plane. For some basic reasons, some machines should be placed on certain positions.

pavlin-policar commented 1 year ago

I see. Yes, like I said, the masking approach would probably be the easiest way to do this, but you'll likely need to tinker around with the optimization parameters. Let me know how it goes!

I'll close this for the time being, since there is nothing actionable I can really do here, but if you need any more help or have any questions later on, please feel free to ask them here.