touretzkyds / oldWordEmbeddingDemo

Word embeddings online demo, rewrite
https://jxu.github.io/WordEmbeddingDemo/
1 stars 0 forks source link

Add original plotly graphs and vector magnified view #2

Closed jxu closed 3 years ago

touretzkyds commented 3 years ago

I wasn't a big fan of the layout of the vector display on the right half of the screen. Rob was using a built-in "heat map" feature for convenience. He didn't have the skills to build a better display himself. But I think we can do better.

There should be a few pixels of white space between the rows.

And there should be tick marks every 10 units, not every 20, to help people find vectors by number.

Also, if we click on a vector component we should have a way to highlight the entire vertical stack at that position.

jxu commented 3 years ago

I'll see if more complicated functionality is possible with plotly. More custom demos I plan on doing with d3.js or p5.js

touretzkyds commented 3 years ago

You're currently displaying the residual dimension along the z axis. Please move this to the y axis as in the original demo, and use the 1-y convention so that the foundational words (man/woman/etc.) have residuals near 0 and "refrigerator" has large residual.

touretzkyds commented 3 years ago

Another problem with the display is that it's being clipped incorrectly, leading to important parts being cut off when the user rotates the 3D plot. Compare the behavior of your demo with the original version at https://www.cs.cmu.edu/~dst/Word2VecDemo and you'll see the difference.

jxu commented 3 years ago

You're currently displaying the residual dimension along the z axis. Please move this to the y axis as in the original demo, and use the 1-y convention so that the foundational words (man/woman/etc.) have residuals near 0 and "refrigerator" has large residual.

Ok, I adjusted axes to match the demo including the residual. The points aren't exactly the same due to my use of a different dataset, but fairly close. I will see how I can fix the clipping.

Screenshot from 2021-06-18 02-24-37

Original: Screenshot from 2021-06-18 02-23-55

jxu commented 3 years ago

Another problem with the display is that it's being clipped incorrectly, leading to important parts being cut off when the user rotates the 3D plot. Compare the behavior of your demo with the original version at https://www.cs.cmu.edu/~dst/Word2VecDemo and you'll see the difference.

Which parts are being clipped? Do you mean axes tick labels?

touretzkyds commented 3 years ago

The axis labels, the tick labels, and a good chunk of the grid lines are being clipped.

I thought some of the word points were being clipped as well, but now I think that was just due to zooming in too far.

The original demo's 3D display was pretty optimal. You should adopt the same settings for your display. When your demo opens, we should have the man/woman/king/queen/etc. words in the foreground of the display, and "refrigerator" in the background, i.e., orient your axes in the intiial view to match the original demo. (And the residual coordinate should be a low value for the people and high for the refrigerator.)

jxu commented 3 years ago

My browser shows the same initial view for both plots. From https://www.cs.cmu.edu/~dst/Word2VecDemo/

Screenshot from 2021-06-18 13-55-29

Do you want the initial view to be like this?

Screenshot from 2021-06-18 13-58-30

Also on my end there is nothing much being clipped. Which browser, zoom, screen resolution are you using? Can you add a screenshot? The original demo's plotly layout is the same, except the xyz ranges are manually calculated, however I don't see much difference.

touretzkyds commented 3 years ago

The second image above is pretty close to what I want the initial pose to be, except I would rotate it a little more so that the gender split of words is easier to see and the genderless/ageless (residual) axis is more foreshortened. The idea is that when someone visits the demo page, they should see exactly what we want a beginner to see, without having to touch any controls. And for beginners the most important things are the age and gender splits for the initial words, plus the fact that "refrigerator" is far away from all of them.

I reloaded the original demo and its initial pose is not what I wanted. We must have lost that fix somewhere along the way; there were so many versions and things were disorganized. So let's fix it in your version.

The other differences is that the original version uses 0.1 tick spacing while you are using 0.2, and the original version is zoomed out a little more so that none of the grid is clipped. Here is what I'm seeing when I initially load both pages. I'm running the latest Chrome on a Linux box.

word2vec-original

word2vec-jxu

jxu commented 3 years ago

Ok, it's pretty difficult to show the point splits for all three axes just because there's no sense of depth without rotating. Here is the new default view with 0.1 tick marks and zoomed out

Screenshot from 2021-06-18 15-24-55

touretzkyds commented 3 years ago

This is better. Thanks. But why are the points so large? They should be smaller. Also I would tilt the view upward a little bit to increase the foreshortening of the residual axis. Here's a screenshot. word2vec-pose The one thing we don't show effectively in this screenshot is the depth of the refrigerator node. I'm wondering if we should introduce some constant slight jitter or smooth rotation into the 3D plot instead of a static pose, as this will help people see the 3D structure without having to rotate the view themselves. But let's worry about that after we get the rest of the demo finished. Are you working on display of the feature vectors now?

jxu commented 3 years ago

I adjusted the chart sizing. Currently I am working on getting a custom vector display, either by modifying plotly's heatmap or by making a lower level display with d3.

jxu commented 3 years ago

There are 300 vector dimensions so they will be quite thin. The solution may be to only display a few vector dimensions at the start and tell users to scroll/pan the plot over. I am thinking to highlight each component, I would manually draw a bounding box on top of the heatmap. Not sure how nice this will look though. I'll try to get the display up and running first.

touretzkyds commented 3 years ago

When I added a new word "oven" it appeared in the 3D plot but in a much smaller font than the other words. This is a bug. Also, the new word should have been made the "active" word and the circle should be red instead of blue.

The word vector display needs to show word labels along the left side.

I think the way to make the display work better is, whenever the user's mouse pointer is over the vector display, we pop up a magnified view of the 11 columns surrounding the mouse pointer (5 to the left, 5 to the right, and the 1 the pointer is over). This magnified view would be the same height as the regular display but would be expanded horizontally, so each column might cover 15 pixels instead of the current 2 pixels. You could also space out the columns by putting 2 pixels of white space between each one and maybe highlight the center coumn. And this whole thing would shift left or right as you slide the mouse around.

jxu commented 3 years ago

I am still getting around to the selected words. I'm not sure of the auto updating magnified view is doable within plotly; I guess it would have to be done outside plotly by writing some js that constantly tracks mouse position (this would also interfere with any other plotly controls like zoom if we use those). Alternatively we could just ask users to pan manually or have a slider that controls the view.

touretzkyds commented 3 years ago

Okay. The magnification thing was just an idea. If it's not easily implementable we can do without it.

I definitely don't want a slider though. I want them to see the whole vector. The current display, even though the vectors are very thin, still seems workable.

Here's another way to make the display more accessible. To the right of the 6x300 element matrix we have a bit of blank space and then a 6x1 matrix. Whichever column of the 6x300 matrix the mouse is currently over, we copy that column into the 6x1 matrix, and we write the column number below it. This will make it really easy to compare vector values for all 6 words. When the mouse is not over any column of the 6x300 matrix, we blank out the 6x1 matrix.

touretzkyds commented 3 years ago

Let me modify the above suggestion slightly. The 6x1 vector should go on the LEFT, not the right, so that on narrow screens where the rightmost bits of the display may get truncated, it will still be visible.

Also we probably shouldn't blank out the vector when the mouse pointer moves outside the 6x300 matrix because this will make it hard for users to take screenshots.

jxu commented 3 years ago

The CSS should scale according to the window's starting width. I am testing on a 1920x1080 monitor, but once I find out how to fix up the CSS to not place my text all over the place, the displays should work on something like a 720p display (I don't think it is possible to find laptops running 1024x768 any more, but it still should be usable then)

touretzkyds commented 3 years ago

I did a "git pull" and your version of the demo appears to be broken. The 3D plot is no longer visible. Did you break it?

jxu commented 3 years ago

I wrote a little about it #8. Basically the new word vector file is too big for github so I didn't include it. I could compress it and upload it anyway, then have js code decompress it like it should already do for content-encoding, but I decided against it at the time. So here are the relevant files I've computed for now (won't let me upload here) https://github.com/jxu/Word2VecDemo/releases/tag/ft-words50k

touretzkyds commented 3 years ago

I installed the two files, and the code is failing:

word2vec.js:244 TypeError: Cannot read property 'sub' of undefined
    at word2vec.js:75
    at Array.map (<anonymous>)
    at createFeature (word2vec.js:75)
    at main (word2vec.js:203)

What is causing this failure? You can see for yourself at https://www.cs.cmu.edu/~dst/test/Word2VecDemo

A zipped version of the 50k words file is only 35 MB. But you should be able to make things even smaller by replacing the ASCII representations of the vector elements with 4-byte floats. That should take only around 20.5 MB.

jxu commented 3 years ago

Ok I see there is a bug. Since I used the most common casing, example words "Prince" and "Princess" are more common so in the word vectors instead of "prince" and "princess". I can either change all the words to lowercase or change the casing of the example words.

I know the word vectors compressed can fit on github. Do you think I should include them again in the source repo if it makes things more convenient?

touretzkyds commented 3 years ago

All words in this demo should be lower case. We don't want to start confusing people with capitalization issues.

I think the words should be in the source repo to make it as easy as possible for people to clone the demo. Most users are just going to visit the demo on my web page; they're not going to know or care that the GitHub source contains a huge data file. But for those who want to modify the demo, we should make it possible to clone it in one step.

jxu commented 3 years ago

Ok, I am in the process of changing to use the compressed vectors file.

Here are the lowercase words for the time being https://github.com/jxu/Word2VecDemo/releases

jxu commented 3 years ago

I have been experimenting with how to get the magnified view on hover to work. Previously I tried to do this through something like a plotly subplot, however this makes things confusing when updating both plots. So my current plan is to have a third plot which mirrors the hovered spot on the heatmap.

It is still WIP while I work out the UI to be cleaner.

touretzkyds commented 3 years ago

Okay. Dom't spend too much time on this. We need to get started on the vector arithmetic.

jxu commented 3 years ago

Magnify plot has been added to the left of the original vector view

touretzkyds commented 3 years ago

The new magnified view is very nice, but there is still some work to be done.

touretzkyds commented 3 years ago