Open mdlincoln opened 6 years ago
Your mileage may vary, but I find classic multidimensional scaling often looks better than PCA for paths through word vector space, as long as you're willing to lose linearity. It should maximize something close to the objective you describe here.
The rotation that maximizes variance along PC1 and PC2 is not necessarily the rotation that minimizes the residuals between the ideal points and the actually-existing points. A more effective visualization might customize the transformation of the full dataset to show the closeness of fit from the selected neighbors.
Do something clever with least squares? probably