Closed vincerubinetti closed 2 years ago
Name | Link |
---|---|
Latest commit | 86a1f57b3358be4afd7b3f9d1d1819b75a709be0 |
Latest deploy log | https://app.netlify.com/sites/word-lapse/deploys/62793e0f99b24100095b091f |
Deploy Preview | https://deploy-preview-49--word-lapse.netlify.app/ |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site settings.
Ah I see what you mean. I plus one the slider idea when we get to that point as trying to show 20 years worth will be very messy all at once. I don't know the max amount of years that can be shown, but something to keep in mind as you progress on this visualization.
Ah I forgot that there could be that many years. For this PR I've just been using that sample json you gave me, not the real API endpoint yet. I'll switch that back in now.
I can certainly do the slider idea, but in my mind that idea was kind of just for fun, so you could see an animation. I don't know if that would help the readability. It would cut the number of clusters in the above screenshot down, but there would still be overlap in the words.
This is a tough situation as there's just so much info to show at once. I'll keep working at it.
Just updated it to use (approximate, based on string char length) rectangles for collision instead of circles. It looks a little better. But then there is a new problem, these tokens get very long and wide, so when they expand, they explode way off to the left and right of the plot. Also, the d3-bboxCollide
library I'm using has a lot of problems and isn't maintained. I could make my own bbox collision -- I get how I would do it -- but it's quite a bit of extra work.
Here are some other ideas:
species_
(tagged?) neighbors smaller as they're less likely to be meaningful to the user? Maybe we can only show the top ~10 or so for each year, sorted by closest to the year dot (searched word), then expand those on hover?Is there any way we can determine just a few of the neighbors to show at full size, and make the other ones much smaller and fainter? Maybe we can make species_ (tagged?) neighbors smaller as they're less likely to be meaningful to the user? Maybe we can only show the top ~10 or so for each year, sorted by closest to the year dot (searched word), then expand those on hover?
It's getting there. Well this information isn't included but each word has a score of similarity where higher numbers means more similar. We could drop to the top 10 highest scoring words and the adjust the font size so the visuals look a lot better. How does that sound as a solution?
It might be better if the backend could return the score along with the token/year/x/y/etc so I can filter it on the frontend, to allow for more flexibility. 10 might be too much or too little. Or in the future I might be able to determine the number dynamically somehow based on the crowdedness of the graph.
It might be better if the backend could return the score along with the token/year/x/y/etc so I can filter it on the frontend, to allow for more flexibility. 10 might be too much or too little. Or in the future I might be able to determine the number dynamically somehow based on the crowdedness of the graph.
Tagging @falquaddoomi to see if you'd like to make this change.
Ok - this is super wild as a visualization, but without being placed within the broader scale of words it looks like things are bouncing all over. Is it possible to have the umap coordinates used be the total bounding box of the umap space of all words/years?
I'm not sure I understand what you mean. Are you're talking about the words expanding and going outside the bounds of the SVG and also overlapping other clusters of words, like this:
I could certainly update the bounding box of the whole SVG when things expand, but that might be kind of jarring. The over-expansion was actually less of a problem when I was treating the labels as circles instead of rectangles, but then there was more overlap in the words.
Is it possible to have the umap coordinates used be the total bounding box of the umap space of all words/years?
This is already what is being done, basically. When you're not hovering, everything is in its proper coordinates, and the dimensions of the space match the min/max x/y of all the years and words. The expand-on-hover effect is just to push them apart so you can read them better.
We're currently restricted to a zoomed-in view (the space over which this word is observed) of a much larger umap space (the space over which any word is observed). Without some of that broader context, the word is changing dramatically, while it might only be making small changes in the broader space. What we're missing is the thing in preprint similarity search where other preprints fall to provide context for this one.
Oh - wait - I might have misread. If this is true:
When you're not hovering, everything is in its proper coordinates, and the dimensions of the space match the min/max x/y of all the years and words
Then something is wrong with the underlying data.
Ok - are the dimensions of the space coming from the immediate neighbors of this word (essentially its close in neighborhood), or is it from all words?
Ok - are the dimensions of the space coming from the immediate neighbors of this word (essentially its close in neighborhood), or is it from all words?
So the coordinates are generated from all neighbors to the query word. For example, pandemic gives 25 neighbors in each year and a UMAP model is trained on only those words (including pandemic). If i'm reading correctly, using all words at once would make these erratic changes appear a whole lot smaller.
Oh - wait - I might have misread. If this is true:
When you're not hovering, everything is in its proper coordinates, and the dimensions of the space match the min/max x/y of all the years and words
Then something is wrong with the underlying data.
Here when I said "all years and words" I just meant all of the x/y coordinates I'm receiving from the backend, which I guess is just the local neighborhood in this case.
Casey it sounds like you mean having the size of the space (i.e. the range) be determined from all words in the model? Wouldn't that just my screenshots in my above posts appear as small blips in a sea of white space, unless you mean to also include some or all of the complete set of words in the model?
using all words at once would make these erratic changes appear a whole lot smaller.
What are the "erratic changes"? If we're talking about the exploding that happens on hover, let's ignore that for this conversation; that has nothing to do with the data and is just a visual effect so to speak. Or do you mean how in the "pandemic" screenshot above, the trajectory arrow path is kind of "tangled".
I'm very confused by all of this. Perhaps a quick zoom meeting is in order?
I mean the range being determined by all words in the model. I think they would appear as small blips in a sea of white space, but it would be a realistic representation of the amount that things change from year to year. This is not about the animation. Right now, it gives the perception that there's no consistency in what a word means from year to year.
Take a look at the new "trajectory" viz that we talked about:
This will need the backend to return the neighbor results for each year sorted by strength/score, because I can only show the top 5-10 (and even with just that, the figure is still quite busy).
I also refactored and polished some other stuff up in the most recent commit.
Hey @vincerubinetti, is there a particular example where the words aren't returned in order of decreasing score? I ask because it seems from my spot-checking that the method we're using to query for neighbors to the target word, KeyedVectors.most_similar(), already seems to return the results in order of decreasing similarity to the target. That said, there's nothing in the documentation that says it does return it in that order, and I can easily sort the neighbors per year by similarity manually, so if there's a query for which it's not returning the results in that order I'll implement it, otherwise it seems like it's already done.
(That, or I'm misunderstanding the ask entirely, so please correct me if that's the case.)
Ah I didn't realize it was already doing that. š Maybe you could add a note in the Swagger docs or something.
If that's the case, maybe @danich1 would like it if the score was returned with the word and I can show it in the tooltip?
If that's the case, maybe @danich1 would like it if the score was returned with the word and I can show it in the tooltip?
Yeah it be nice to incorporate that information.
I ask because it seems from my spot-checking that the method we're using to query for neighbors to the target word, KeyedVectors.most_similar(), already seems to return the results in order of decreasing similarity to the target.
I can +1 on the already sorted return values. There isn't anything in the documents that guarantees it's sorted, but everytime I use the function it returns in sorted order.
Ignore my previous comment, I didn't realize the score was already being returned. When did that get put in? I'll have that info show in the tooltip.
@vincerubinetti, @danich1: currently the tooltip shows 'Tagged/Not tagged' for entries that have a tag or don't, but do you think there might be utility in showing what the tag is in the tooltip if it's present? For me, knowing the ontology term is useful, but I don't know who the audience is for the site.
currently the tooltip shows 'Tagged/Not tagged' for entries that have a tag or don't, but do you think there might be utility in showing what the tag is in the tooltip if it's present?
This was mentioned in #39. The idea is to provide which tag the term is referring to and to have it link to a webpage that displays more information about the tag.
I was planning to add that in the next PR.
~This is still a WIP, but @danich1 take a look at the hover behavior. Right now, for collision, the labels are treated as circles, which is why you'll still see some overlap, but I'm working on doing it via rectangles, so it should look better soon.~
~I tried enlarging the nearby labels when the mouse passes over (and fading everything else), but it was still a bit hard to read because of the overlap. I also tried just doing that for the single label that is hovered, and that makes it more readable, but then you can only sort of see one label at a time.~
~So all that is to say, I think this is probably the best solution I could come up with.~
wrapLines
func to generic utilitywrapLines
util func