Closed okayzed closed 7 years ago
Wow, this is fantastic!!! After I quick read through the code, I am very impressed, and I will try to do a longer code review on Monday (I have another deadline I am swamped with at the moment).
I actually really like the table view. It is simple, but quite nice.
In terms of expanding the usefulness, the most immediate thing that comes to my mind is easy support for loading/using different datasets. I could be wrong, but it seems like the current code can only load from the default location "embeddings/eng-all_sgns". This seems like the biggest way to make it more useful. There would need to be a consistent naming format etc for the embedding files, but I think researchers would love to be able to visualize changes in their datasets with this tool.
A much more minor point on the visualizations: For the cloud view, would it be possible to toggle on/off the drift for the background words? E.g., it might be useful to only show the background words for the most recent context, and show the target word shifting in that space, if that makes sense. I'll need to look at the code in some more detail to see if this is really reasonable though.
Thanks again! This is awesome.
will work on it more next week, also have some stuff this weekend.
re: embeddings: that is fine - the main problem is how much RAM they take up (4GB or so for one set), so it is hard to load multiple corpuses at once. a simple solution is that the webserver can start up and ask which embeddings to use on first load (and generate the candidate list by scanning the embeddings/ dir)
re: cloud view: that's possible. it would be easiest to accomplish by just turning all words off not from the most recent decade (but the invisible words would still have influenced the tSNE layout). to do so, we'll add some simple toggle controls to the frontend
Okay, so I have tinkered around more, and it is a substantial amount of code, but so far I haven't found anything problematic, and the interactive visualization is working great for me.
A couple minor things:
python scripts/closest_over_time_with_anns.py awful
throws an error because the directory vis/web/output
doesn't exist. This directory could be made if necessary using ioutils.make_dir
command from the main branch. Also, README should also be updated to run the command as python scripts/closest_over_time_with_anns.py awful
or the output path should be changed to scripts/output/...
instead of viz/scripts/output
. pythonw
instead of python. Maybe worth mentioning this in the README. I looked at the get_time_sims
code in some detail and it looks great!
After playing with it for some time, I am really starting to like the table view (especially compared to the cloud view). Not sure if you feel the same way, but it is a lot easier to work with, and I really like the automatic shading.
Overall, I am very impressed, and nothing has broken yet in my tinkering/playing around. I would be very happy to merge this into master sooner rather than later, but I'll defer to you because I know you have other things you want to potentially add before merging.
Oh, and in reply to you response to my comments:
thanks for the patience - this next set of diffs addresses the feedback from earlier (blank config.py file, selectable embeddings, control for cloudview to toggle visibility of words,etc) and adds a new view, "timeline view" (screenshots here), for browsing through the embeddings. from here, we are ready to merge (after cleaning up any last things you want us to take care of).
stepping back, i'd like to highlight what each view can tell us. below, i use 'query term' for the search word and 'neighbor word' for the nearest neighbors words that are returned by the API.
No worries at all! This is really great work, and well worth waiting a few days for the new timeline view etc :)
I've played around with it some more, and it works amazing! And the new capabilities are fantastic. Again, I gave the code a skim, but I haven't ran into any issues or problems.
I would be very happy to merge this into the repo if you think that it's ready. Let me know.
I can also make you an official collaborator on the main repo. You put a lot of work into this visualization code, so I think it is fair and reasonable to make you an official collaborator. (You can also tinker with and maintain the code more easily then).
Does that all sound reasonable to you?
thanks! that sounds reasonable. there's actually two of us, @juniferd and myself, but i'll take contributor rights (and be responsible for on-going maintenance of the viz).
the code is ready to be merged, but there is still some stuff to be added (eventually) around controls for selecting which decades to search and how many nearest neighbors to show.
Awesome! The merge is done and you should have contributor rights. Thanks again @okayzed and @juniferd!
this merge request adds a webserver that supports interactive visualizations. at the moment, it is somewhat bare, but it allows for query exploration. there is a README.md file in viz/ and the majority of the interfacing with existing histwords code is in viz/common.py.
the one main pip requirement to run this code is pylru, the rest of the requirements.txt file is from the existing requirements for histwords repo that we've run into.
some screenshots are here: http://imgur.com/a/XrhFj
i'm happy to get code review (but i know it takes time) and i'm more interested in getting feedback / ideas on how to expand on the usefulness of the program. i'm also not in any rush to get it merged, so don't worry about that.