vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.3k stars 793 forks source link

Error in layered graphs #180

Closed Hisham-Hussein closed 6 years ago

Hisham-Hussein commented 8 years ago

Hello, and thanks for the beautiful package.

I followed the instructions here: layered charts

I even copied and pasted all the code as is, but it produces this error: "ValueError: No data provided"

Although when I try to plot each individual chart it is plotted without any errors, but when using the (+) operator to combine them and then plot the layered chart it produces the aforementioned error.

Thank you

jakevdp commented 8 years ago

Hi – thanks for the report. Can you post the exact sequence of commands you run, and also post the full traceback of the error? Thanks.

Hisham-Hussein commented 8 years ago

Yes, sure: here it is:

data1 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})
data2 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})

chart = Chart(data1).mark_line(color='#1f77b4').encode(x='x', y='y') + \
        Chart(data2).mark_point(color='#ff7f0e').encode(x='x', y='y')`

chart
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\hisham\Anaconda2\envs\py35\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
       907             method = _safe_get_formatter_method(obj, self.print_method)
       908             if method is not None:
-->  909                 method()
       910                 return True
       911 

C:\Users\hisham\Anaconda2\envs\py35\lib\site-packages\altair\api.py in _ipython_display_(self)
       186         from IPython.display import display
       187         from vega import VegaLite
--> 188         display(VegaLite(self.to_dict()))
       189 
       190     def display(self):

C:\Users\hisham\Anaconda2\envs\py35\lib\site-packages\vega\base.py in __init__(self, spec, data)
        21         """Initialize the visualization object."""
        22         spec = utils.nested_update(copy.deepcopy(self.DEFAULTS), spec)
---> 23         self.spec = self._prepare_spec(spec, data)
        24 
        25     def _prepare_spec(self, spec, data):

C:\Users\hisham\Anaconda2\envs\py35\lib\site-packages\vega\vegalite.py in _prepare_spec(self, spec, data)
        22 
        23     def _prepare_spec(self, spec, data):
---> 24         return prepare_spec(spec, data)
        25 
        26 

C:\Users\hisham\Anaconda2\envs\py35\lib\site-packages\vega\utils.py in prepare_spec(spec, data)
        91         # Data is either passed in spec or error
        92         if 'data' not in spec:
---> 93             raise ValueError('No data provided')
        94     else:
        95         # As a last resort try to pass the data to a DataFrame and use it

ValueError: No data provided
jakevdp commented 8 years ago

Thanks... from the looks of it Altair is spitting out correctly-formatted JSON. It's the ipyvega library that is being too restrictive with the output in this case. Let me dig a bit more...

jakevdp commented 8 years ago

I opened an issue in ipyvega: https://github.com/vega/ipyvega/issues/54 That's the relevant fix to this bug.

jakevdp commented 8 years ago

Until the fix lands in an ipyvega release, you can get around this issue by setting the data attribute of the top-level chart to an arbitrary dataframe. For this bug, it only matters that it is defined: the actual values should be ignored.

For example:

data1 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})
data2 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})

chart = Chart(data1).mark_line(color='#1f77b4').encode(x='x', y='y') + \
        Chart(data2).mark_point(color='#ff7f0e').encode(x='x', y='y')

chart.data = data1
chart

vega 13

Thanks for the report!

Hisham-Hussein commented 8 years ago

Okay ... thanks a lot for your quick and professional response !  By the way ... I almost read all of the documentation. And although it is clear and good, I still think it is not enough at all. The point is: there's no explanation for the whole grammar of graphics on which Altair was built upon. The documentation is just a collection of examples, mostly copying Vega-Lite documentation examples!  I was really excited when I listened to the talk in Scipy conference and went to the documentation and read it. But now I'm back to matplotlib and seaborn, because I can't do a lot of things with Altair, due to lack of explanation and documentation. Not to discredit your hard work, but when comparing with ggplot2 documentation, there's a big difference. In ggplot2 case, I can reach easily to whatever point I want to reach out to, no matter how subtle or sophisticated, and every single bit of information is accompanied by lots of examples that make things easier to understand.  I'm sorry if I'm being too blunt about this matter ... but I think it's for the greater good.  Thanks a lot again for your hard work.

jakevdp commented 8 years ago

Regarding the docs - I absolutely agree. We have a long way to go there, and would appreciate any help we can get.

Part of that is that vega-lite is still young and evolving rapidly, so investment in detailed docs presently would be counter-productive. My hope is that after the planned large feature release in VL this fall, I can start chipping away at the Altair doc issue.

Hisham-Hussein commented 8 years ago

Oh, now I understand. That's a relief ! I thought this version of the docs would be the final one ! Thanks a lot for explaining.  By the way I'd love to help if you'd allow me. I have experience in ggplot2, which I think the closest visualization package to Altair, in the sense that both of them are implementation of the grammar of graphics.  I have extensive notes about Exploratory Data Analysis with ggplot2. It's closely following the excellent Udacity Class: "Data Analysis with R", which is the best class I've ever seen to teach EDA. Here is a sample of my notes that cover 2 lessons of the class (6 lessons in total): RPubs - EDA with ggplot2 - Lesson 3: Exploring 1 Variables

RPubs - EDA with ggplot2 - Lesson 3: Exploring 1 Variables | |

|

RPubs - EDA with ggplot2 - Lesson 4: Exploring 2 Variables

RPubs - EDA with ggplot2 - Lesson 4: Exploring 2 Variables | |

|

 I was intending to translate these to Altair when I heard the talk at Scipy conference, but as I told you I couldn't. Maybe after the major feature release you talked about I can start working on it.  I have to point out that my notes are first draft, so please forgive the sloppiness and incompleteness you can easily notice :) So, in summary, I can collaborate in terms of writing extensive tutorial in an EDA context that illustrates the power of grammar of graphics implemented in Altair.  Thank you very much Hisham

jakevdp commented 8 years ago

FYI, email attachments don't come through on github comments.

It would be great to have your help on docs and tutorials!

Hisham-Hussein commented 8 years ago

Oh my bad ... sorry for that... I didn't know that before :)

Here are the links again: exploring one variable exploring two variables

I always wanted some visualization package like ggplot2 in Python, it'd be really great if Altair could fill this gap !

Thanks a lot for your trust in me.

kanitw commented 8 years ago

It would be great to have a list of missing * important * plot types so that we can try to include them in Vega-Lite's 2.0 release :)

Hisham-Hussein commented 8 years ago

Thanks for asking: 1- One thing I noticed is that faceting is only available by row, or by column. That means if I have categorical variable that contains 20 levels, and I want to split my plot by that variable, there will be either 20 rows or 20 columns, which of course not feasible !

in ggplot2, you simply can add an argument to specify how many rows or how many columns you want the plot to be split across. Have a look here facet_wrap in ggplot2

2- Another thing is: it's nice to have default values for each plot. I will also give an example from ggplot2 -sorry for that :) - when you make a barplot you only need to specify the x-axis, and the y-axis by default takes the value of (count(x)) in each category, which is very convenient. And if you want to map the y-axis to another variable, you can easily do that.

3- It would be very nice to have the option of assigning colors by their natural names. I mean: 'blue', 'dark blue', 'red' ... etc. it's not convenient that whenever I want to assign a color, I have to look for its hex value! Of course this will be also available for more sophisticated color choice.

4- I would love to suggest more if I happened to know Vega-Lite quite well. Unfortunately I don't really know all the capabilities of the current version of it. I tried to read the documentation, but the thing I noticed is that there's a mismatch between the amount of information documented and the number of examples accompanying this information. I couldn't really hold a good grasp of the library due to that ! maybe it's some lack of intelligence on my part. But when I read the documentation of ggplot2 for example I find it easy to understand anything I want to understand.

But I still believe the project is very promising and I'm looking forward to it.

Thanks a lot .... :+1:

kanitw commented 8 years ago

@Hisham-Hussein Thanks for the feedback.

Re:

One thing I noticed is that faceting is only available by row, or by column.

Agreed. We have an issue for that here: https://github.com/vega/vega-lite/issues/393, but we are prioritizing interaction features more than this one, so we might not include it in 2.0. (Maybe later in 2.x.)

when you make a barplot you only need to specify the x-axis, and the y-axis by default takes the value of (count(x)) in each category

This one is not necessary always a good default at the Vega-Lite level. Unlike ggplot2 that's designed as a standalone tool, Vega-Lite is also designed to be an intermediate representation for visualization tools (for example for polestar). Therefore, imagine if users drag a field to x and about to drag a field to y, the user wouldn't expect to see count on y in an intermediate state. That said, this might be a sensible default at the Altair level. We might consider adding autoAddCount config to Vega-Lite that can be turned on by default for tools like Altair. I'm not sure about that yet, but we have similar config in CompassQL.

3- It would be very nice to have the option of assigning colors by their natural names. I mean: 'blue', 'dark blue', 'red' ... etc. it's not convenient that whenever I want to assign a color, I have to look for its hex value! Of course this will be also available for more sophisticated color choice.

We do support standard web color names.

(Try

{
  "description": "A scatterplot showing horsepower and miles per gallons for various cars.",
  "data": {"url": "data/cars.json"},
  "mark": "point",
  "encoding": {
    "x": {"field": "Horsepower","type": "quantitative"},
    "y": {"field": "Miles_per_Gallon","type": "quantitative"},
    "color": {"value": "goldenrod"}
  }
}

in the online editor)

I tried to read the documentation, but the thing I noticed is that there's a mismatch between the amount of information documented and the number of examples accompanying this information.

We'll definitely add more examples when we upgrade to Vega-Lite 2.
If you find any compelling examples, feel free to submit a PR to Vega-Lite.

Before we spam this issue with general VL discussion too much :p, if you have any question / suggestions for Vega and Vega-Lite in the future, please feel free to post in vega user groups or the Vega-Lite repo.

Thanks!

Hisham-Hussein commented 8 years ago

@kanitw Okay .... Thanks a lot for your reply!

onhafoghefifo commented 7 years ago

Hi, I tried to plot a graph with a part of it in another color, to highlight the anomaly region of a time series. I tried to do this using the solution @jakevdp posted above, but when I do this, the whole chart gets the same color. How should I procede?

data1 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})
data2 = data1.iloc[5:8]

chart = Chart(data1).mark_line(color='#1f77b4').encode(x='x', y='y') + \
        Chart(data2).mark_line(color='#ff7f0e').encode(x='x', y='y')

chart.data = data1
chart

vega

jakevdp commented 7 years ago

@onhafoghefifo it works correctly for me in the most recent Altair release (v. 1.2). What version are you using?

onhafoghefifo commented 7 years ago

I am using v. 1.0.0, which is currently the newest version available via conda. Should I install it via pip to get the v. 1.2?

jakevdp commented 7 years ago

Yes, please update to 1.2 and be sure to re-install the nbextension as well, following the installation instructions.

onhafoghefifo commented 7 years ago

Updated Altair to v. 1.2 with conda, run

$ jupyter nbextension install --sys-prefix --py vega
Installing /home/gca/miniconda3/lib/python3.5/site-packages/vega/static -> jupyter-vega
Up to date: /home/gca/miniconda3/share/jupyter/nbextensions/jupyter-vega/index.js.map
Up to date: /home/gca/miniconda3/share/jupyter/nbextensions/jupyter-vega/vega.js
Up to date: /home/gca/miniconda3/share/jupyter/nbextensions/jupyter-vega/vega-lite.html
Up to date: /home/gca/miniconda3/share/jupyter/nbextensions/jupyter-vega/index.js
Up to date: /home/gca/miniconda3/share/jupyter/nbextensions/jupyter-vega/vega.html
- Validating: OK

    To initialize this nbextension in the browser every time the notebook (or other app) loads:

          jupyter nbextension enable vega --py --sys-prefix

but the graph is still with one color. Any ideas? Printed Altair version being used by jupyter notebook, and it is 1.2.

jakevdp commented 7 years ago

Did you restart the notebook? If not, I think it will still be using the old version of the jupyter extension.

onhafoghefifo commented 7 years ago

I closed the notebook (2 x ctrl-c in the terminal), installed everything, closed the terminal for conscience's sake, and started everything again. Maybe the checkpoints are doing something about it?

jakevdp commented 7 years ago

I'm really not sure what to say... it works correctly in Altair version 1.2:

screen shot 2016-11-22 at 7 05 21 pm

How did you initially install Altair? If you installed with conda the first time and then updated using pip, it's possible that the updated nbextension was installed in a location that is over-ridden by the old version.

jakevdp commented 7 years ago

Maybe @ellisonbg would have ideas of what the issue is? How might an older version of an nbextension linger after someone installs the updated version?

onhafoghefifo commented 7 years ago

Here is the same code on my notebook. I tried running twice the generate graph code to see if it could plot any difference, but to no use.

image

I'll try to do a full reboot in a few minutes, to see if anything changes.

Also, looking at your graph, shouldn't the yellow line be entirely above the blue one, in the path that it exists?

jakevdp commented 7 years ago

Ah, looks like layering broke in vega 0.4.4. If you downgrade to vega 0.4.2 it should work correctly.

Again, as noted earlier in this issue, layering support is still experimental and unstable at this point. Expect it to work better when Vega-Lite 2.0 / Altair 2.0 is released.

ellisonbg commented 7 years ago

Because some of this looks like it is related to ipyvega, we should at least look at this with the new jupyterlab_vega approach in 1.3.

yiti2661 commented 7 years ago

Hello, Is there any advance in this topic? I downgraded to 0.4.2 but it doesn't work either. It displays an empty plot (just the axis). I have to separated dataframes. If I run the example with the data generated by @onhafoghefifo it wouldnt highlight the blue range either. If I use it with the data generated by @jakevdp it works fine. Help!

jakevdp commented 7 years ago

Layered graphs are not well-supported in the current release. We should have better support in Altair 2.0, which will be coming soon! (As soon as I finish parental leave and have time to work on it :smile:)

jakevdp commented 6 years ago

fixed in v2