Generating vega-spec bar graphs

ffarhour commented 8 years ago

This is my work so far (mostly done in the past week). @jowens @yzhwang @huanzhang12 I wanted to get some feedback on the quality of my code, and any suggestions on how to improve.

The main file to look at is json2vega.py which takes in json outputs in a specific folder and spits out a graph based on your preferences.

Right now I haven't finished the commandline arguments, but have hardcoded a test in the main function.

Summary of what the code does and how it is structured:

I have set up a base class, which contains the main methods I would use for file IO, plus some other functionalities that would overlap across child classes
The child classes which inherit all the methods of the base class will be reponsible for the specific details of each type of graph that needs to be outputted. So there is a child class for every type of graph to be made. At the moment I only have 1 child class for creating bar graphs.
each child class takes in a set of arguments which include
- engine_name: the name of the engine used (e.g. gunrcok),
- algorithm_name (e.g. BFS) ,
- conditions_dict: a dictionary containing the conditions to narrow down the jsons used, for instance telling it to only look at jsons in the file that have data about BFS ,
- axes_vars: a dictionary defining the variables used for the x and y axes.
In order to generate the vega-spec jsons, the program takes in a config file, one specific for each type of graph. The config file which is a json with vega-lite specs, details how the graph should look visually! @jowens suggested I add a default in case a config file isn't provided, and am planning to do that.
I have used the python library _argparse _ to process input commandline arguments. It is very easy to implement new arguments.
as an extra touch I created a class for coloring the commandline outputs with ANSI colors!! :D

Thanks in advance for your feedback!

jowens commented 8 years ago

You do well to write documentation for what you've implemented. You would do even better if you marked up that documentation in the canonical documentation format. :) docstrings and pydoc are what you want to look at. Just use Google's style guide:

https://google.github.io/styleguide/pyguide.html#Comments

jowens commented 8 years ago

I think I'd like to see three things before I can fully evaluate this:

readme.md that says how to use this script
Sample python script that calls functions in json2vega.py that produces a graph. readme.md presumably would reference this script as its example.
Another (standalone) script that builds all (or at least most of) the graphs that my previous crummy script did. The main goal here is to make that script as succinct as possible, at the expense of more complication in json2vega.py.

Everything I see looks good, but I will actually be able to evaluate it if I look at how I as a programmer will call it to make a graph, and that's not clear in this checkin.

ffarhour commented 8 years ago

Cool, thanks :) I will get on those right away

ffarhour commented 8 years ago

@jowens I've completed most of the things you asked:

Used standard commenting and docstrings, as described in Google's documentation
added readme.md in scripts folder
separated main implementation and the usage script. json2vega.py now contains only classes. test_json2vega.py tests it by creating a sample bar graph.

I have not yet had the time to finish the complete implementation of all plot types, but once I do I will create a standalone script that builds all the graphs your code made.

jowens commented 8 years ago

Spaces. Only spaces. No tabs. Never tabs. http://legacy.python.org/dev/peps/pep-0008/#tabs-or-spaces

ffarhour commented 8 years ago

oh, haha! had no idea. All fixed now

jowens commented 8 years ago

OK. Tried this this morning. FWIW, I had to set an environment variable to run test_json2vega.py so that it could find vega-lite: NODE_PATH="/Users/jowens/Documents/working/vega-lite" python test_json2vega.py. (I'm just writing it here for posterity.)

When I run this, I don't know what it does. It should probably say what it does, e.g. Creating output/_g_BFS_0.json. Instead it prints out a bunch of stuff that is basically comments. Don't print comments. Don't print any of that unless you run it with -h.

OK, so then I have a .json file. How do I view it? Might be useful to also write a .html file along with it, or at least make that a command-line option. (But I pasted it into the vega editor so I could see it.)

The graph looks nice. However, weirdly, it has a lot of duplicate entries in the vega source. That's probably not good. Perhaps have the script take the latest-date value, rather than all values?

Notes on json2vega.py:

Make this as short as possible. It is likely that your users will copy this file every time they want to make a graph. So you don't want stuff like bcolors or InputArgs in this file. Instead offload those to another file, util.py or something like that, and include it. It seems like all you want in this file is everything after # Create required arguments ....
And for that chunk of code -- right after # Create required arguments ... -- comment liberally. For this, as an example, make it clear what's going on.
Why do we want a bar1 and a bar variable separately? Will the reader always call read_json() then parse_jsons() then write_json() in succession exactly in that way? If so, maybe wrap that all into a function. What do you think?
For the arguments to these functions (e.g., VegaGraphBar), python lets you use named arguments. It is not clear when I look at the VegaGraphBar call what those arguments mean. But when you say VegaGraphBar(output_path=args.o, input_path=args.i, config_path="config_files/", ...), it's a lot clearer.
Shouldn't config_path be config_dir and have no slash? config_dir='config_files'?
Not immediately clear from looking at the code, so I might be mistaken here, but with the axes, I think there's two things we want to do. One is to set the field that is used for x and y. The other is to set the actual text that is shown on the axis label (e.g., x is 'Dataset', y is 'MTEPS'). By default the second should be set to the first, but it needs to be possible to set them separately. (This is particularly important when the desired axis label has spaces in the name, since that's not going to be in the field names in the JSON.)

This is great overall. I hope these comments are helpful. I'm gonna merge it, and you can either keep filing merge requests (and I will review them) or just push directly into the repo, your call. @yzhwang @sgpyc I'd like your eyes on this too, please.

ffarhour commented 8 years ago

@jowens thanks for the detailed comments and suggestions. I will spend the weekend working on these.

yzhwang commented 8 years ago

Sorry for the delay @ffarhour , I will try this today and provide my comments.

yzhwang commented 8 years ago

@ffarhour I don't have anything to add to John's comments. I think the most important thing is the flexibility to use latest-date value, or specify values to use. Also, I like John's heatmap for DOBFS a lot, does the current version support that too? (If not it's OK, not a huge priority)

BTW, when I run test_json2vega.py on luigi, I got an error: Traceback (most recent call last): File "test_json2vega.py", line 85, in main(sys.argv) File "test_json2vega.py", line 80, in main bar = bar1.parse_jsons() File "/home/yzhwang/Projects/io/scripts/json2vega.py", line 159, in parse_jsons bar["data"] = {"values":df_restricted.to_dict(orient='records')} TypeError: to_dict() got an unexpected keyword argument 'orient'

Is this because I missed some dependencies? It would also be nice to specify all the dependencies it needs.

Anyway, thank you for doing this @ffarhour . Good job! Looking forward to seeing the final version of this and trying it out on some of our performance characterization works!

jowens commented 8 years ago

@yzhwang check your python-pandas version for the to_dict call. I've got 0.18 on my machine, which appears to work.

yzhwang commented 8 years ago

Thanks @jowens ! After I upgraded pandas on luigi, it could run. But it took me longer than I expect to finally get the result since I had to go through a lot of efforts to install node.js, d3 and vega-lite. @ffarhour I think you should imply these as dependencies for running gunrock.io. Also, as @jowens said, the output is a json file, which as a user who never tried vega-lite before as me, I don't know how to visualize it. So probably make the output as a vector graph in pdf or svg format?

jowens commented 8 years ago

Agreed on listing the dependencies.

I think the best thing the script should do to visualize the output is also to output an html file that wraps the JSON file (as my old script did). @yzhwang in the meantime you can paste the JSON into the vega-lite editor (that's what I did).

ffarhour commented 8 years ago

Hi all. Sorry for the late reply, I think I'm coming down with a cold. Anyways, thanks for the comments. I will work on these and report back soon. @yzhwang at the moment the code doesn't support the heatmap, but I'll integrate it once I fix everything else. @jowens @yzhwang what's the best way to list dependencies? I know that I can use setuptools' "install_requires", but I'm not sure if that will be helpful for non-python packages such as nodejs. Also, do you think it would be worth having a function to automatically generate the png files using vega-lite?

jowens commented 8 years ago

Just put the list of dependences in the readme (and maybe in the usage() call in the script).

I think generating a PDF is much more useful than a PNG, so I would be in favor of PDF output.

ffarhour commented 8 years ago

Thanks :) Of course, PDF would be much better, will look into that.

ffarhour commented 8 years ago

I've fixed most of the things you asked. I am planning to focus on adding more plot types next. Changes made:

changed to spaces, not tabs
no comments printed. verbose argument can now be passed to print what is happening, e.g. file...created.
Made json2vega small by getting rid of bcolors and having it in a separate python file
run function implemented to do the read, parse, write automatically, without the need for user interference
Named arguments used
can now set axes labels manually by passing an argument.
dependency list added to readme

Once I have more implemented more plot types I will submit another pull request

jowens commented 8 years ago

Great! Looking forward to reading it. These are all good improvements.

gunrock / io

Generating vega-spec bar graphs #1