vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.28k stars 792 forks source link

Data source in Jupyter needs full URL #2432

Open drnw opened 3 years ago

drnw commented 3 years ago

In Jupyter Lab I have seattle-weather.csv, bar.vl.json and a notebook all in the same folder. The data is from the example data sets, as is the Vega Lite:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "seattle-weather.csv"},
  "mark": "bar",
  "encoding": {
    "x": {"timeUnit": "month", "field": "date", "type": "ordinal"},
    "y": {"aggregate": "count", "type": "quantitative"},
    "color": {"field": "weather", "type": "nominal"}
  }
}

Opening the Vega Lite file directly creates a tab with the bar chart rendered as expected. In the notebook I need to pass the full URL to Altair to create the same visualisation:

alt.Chart('http://localhost:8888/files/seattle-weather.csv').mark_bar().encode(
    x='month(date):O',
    y='count():Q',
    color='weather:N'
)

Why does alt.Chart('seattle-weather.csv')..... not work in the notebook? Can the docs be updated to include guidance around this use case scenario?

joelostblom commented 3 years ago

I think it is outside the control of Altair and only depends on the frontend (in this case JupyerLab). From this comment https://github.com/altair-viz/altair/issues/2318#issuecomment-712183293:

You can use any URL visible to your frontend (i.e. JupyterLab, Jupyter Notebook, Colab, VSCode, Streamlit, etc.)

All of those frontends have different mechanisms for accessing local files (if they have a mechanism at all), so there is no way to provide general advice for how to proceed.

It's interesting that it works when you open the vegalite spec directly in JupyterLab, I am not sure if resolves file paths differently depending on the filetype (json vs ipynb). I agree that it would be nice if a file path could be specified without the full UR, but it might be an issue to raise on the JupyterLab tracker. I either case, I agree that this could be added to the docs somewhere.

drnw commented 3 years ago

Thanks @joelostblom. You say:

I agree that this could be added to the docs somewhere.

I'll endeavour to do that. Please bear with me as I am new here. I will take a little while to figure out how to submit a proposed change to the docs. Once I have figured that out I will post something in the right place.

drnw commented 3 years ago

I will take a little while to figure out how to submit a proposed change to the docs

I have the repo cloned etc. I have a change to doc/user_guide/data.rst in my dev environment. However, I have not yet figured out how to make the docs locally. I will create a pull request once I have learned how to make and have checked my work.

drnw commented 3 years ago

image

That was harder than it should be. Now on to creating a pull request.

drnw commented 3 years ago

Pull request #2433 has the proposed change to the documentation.

jakevdp commented 3 years ago

I don't think the full URL is actually required. For example, I ran this in a local JupyterLab instance and it produced a correctly rendered chart:

import altair as alt
from vega_datasets import data
!curl -O {data.seattle_weather.url}

alt.Chart('files/seattle-weather.csv').mark_bar().encode(
    x='month(date):O',
    y='count():Q',
    color='weather:N'
)

(This will only work if you have not used the %cd magic command to change the working directory from the JupyterLab root. I believe there are also caveats about different ways to start kernels and different jupyter configuration settings that will make this fail, because the location in which the curl command is run can diverge from the JupyterLab root directory)

joelostblom commented 3 years ago

I was troubleshooting the latest comment in https://github.com/altair-viz/altair/issues/1732 and noticed that while the example with count():Q above works, I need the full URL with localhost for plotting anything that is not a count. For example, the following renders an empty chart:

import altair as alt
from vega_datasets import data
!curl -Os {data.seattle_weather.url}

alt.Chart('files/seattle-weather.csv').mark_point().encode(
    x='month(date):O',
    y='temp_max:Q',
    color='weather:N'
)

image

But this works as expected:

import altair as alt
from vega_datasets import data
!curl -Os {data.seattle_weather.url}

alt.Chart('http://localhost:8888/files/seattle-weather.csv').mark_point().encode(
    x='month(date):O',
    y='temp_max:Q',
    color='weather:N'
)

image

This is running as the first cell in a newly started notebook so no cd or similar has been executed.

joelostblom commented 3 years ago

I just discovered that it works as expected with /files instead of just files.