pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.44k stars 158 forks source link

Shuffle data before processing? #167

Closed califyn closed 3 years ago

califyn commented 3 years ago
Expected behaviour

t-SNE output does not depend on order of inputs.

Actual behaviour

If the data is concatenated based on class (like type1 type1 type1 type2 type2 as opposed to type2 type1 type1 type 2 type1), the t-SNE plot misleadingly separates the data points in each class. Shuffling the data beforehand causes the data points in each class to be evenly mixed without separation in the result.

I think it would be helpful to at least include an option to shuffle data before processing it, because it was not clear that input order would affect the output.

dkobak commented 3 years ago

Hmm. Can you show the plots? Or ideally post your data and code? Because what you describe shouldn't be happening.

califyn commented 3 years ago

I think there was an issue in my code--fixed it and didn't see any difference. Sorry about that.

The library is very useful, by the way. Thanks.