observablehq / runtime

The reactive dataflow runtime that powers Observable Framework and Observable notebooks
https://observablehq.com/@observablehq/how-observable-runs
ISC License
1.01k stars 71 forks source link

Lazy evaluation #246

Closed neelance closed 3 years ago

neelance commented 4 years ago

I hope I am not asking for a feature that already exists. I couldn't find an elegant solution for it.

I have a notebook that now uses 5 data sources: https://observablehq.com/@neelance/corvid-19-trends Initially only the first source gets displayed, so it is not nice that all downloads have to be done at page load and only afterwards the graph gets displayed.

Now I could manually wrap all data dependencies in some function that returns a Promise, but isn't this something that Observable could do automatically? E.g. instead of using data directly, I could use it with some keyword like lazy so that lazy data would return a function with the type () => Promise<Data>. Then I could only call the function if this is the data I currently want to display and Observable would only evaluate the data cell and its dependencies if necessary. This of course assumes that the data cell got imported form another notebook, so its result is not getting displayed directly.

mootari commented 4 years ago

~Personally I'm not keen on seeing yet another custom keyword in Observable. But perhaps this could be provided through another built-in promise, similar to invalidation, but at the opposite end of a variable's lifecycle, i.e., the beginning:~

d3 = await evaluation, require('d3')

~I'm not sure though if the runtime allows differentiating between access from another variable vs access from an observer.~

Edit: Nevermind, I need to work on my reading comprehension. 🤦

mbostock commented 3 years ago

Observable already performs lazy evaluation when importing and when embedding, in the sense that it only evaluates cells that are referenced statically (akin to treeshaking). You can also implement lazy evaluation based on the viewport using the visibility promise.

In this case, the runtime can’t be lazy because all the data values are referenced statically in the main notebook.

The recommended way to fix this would be for your child notebooks to export functions that load the data. For example:

async function fetchData() {
  const raw = await d3.csv(
    "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv"
  );
  const data = [];
  for (const d of raw) {
    data.push(
      {
        type: "confirmed",
        region: d.state,
        ymd: d.date,
        date: parseDateYMD(d.date),
        total: parseInt(d.cases, 10) || 0,
      },
      {
        type: "death",
        region: d.state,
        ymd: d.date,
        date: parseDateYMD(d.date),
        total: parseInt(d.deaths, 10) || 0,
      },
    );
  }
  return data;
}

Then in your main notebook, you’d say:

import {fetchData as fetchData3} from "@neelance/the-new-york-times-covid-19"
data = ({
  "Johns Hopkins CSSE (Global)": fetchData1Global,
  "Our World in Data (Global)": fetchData2,
  "Johns Hopkins CSSE (US)": fetchData1US,
  "The New York Times (US)": fetchData3,
  "Robert Koch Institute (Germany)": fetchData4,
})[dataSource]()

That way, only the currently-chosen data source is loaded.

Apologies for the delayed response here. If you’d like further help, please head over to https://talk.observablehq.com where we’d be happy to offer further assistance.