observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
957 stars 83 forks source link

Fix sql table only processes first yield #328

Closed mootari closed 1 year ago

mootari commented 1 year ago

When loadSource instantiates a DuckDB instance from an array data source, it caches the corresponding DuckDBClient instance using the array instance as key. When the data source yields subsequent updates, these are no longer processed, breaking the data flow between the data source and SQL cell.

This PR implements a stop-gap solution based on a suggestion by @Fil:

Note that this does not solve the following case. Given the data source

data = {
  const result = [];
  yield (result.push({v: 1}), result);
  yield (result.push({v: 2}), result);
}

the following SQL cell will only list the first row:

select * from data
mootari commented 1 year ago

Hmm. This presumes that someone will call loadDataSource again after the streaming data is done. If no one calls it, then the promise will never resolve; sourceCache has no way to “listen” to the streaming data itself and be notified when it is done.

We only delay resolving if .done exists and is explicitely set to false. Resolving can already happen on the first call (unless I messed up somewhere).

Also, even when loadDataSource is called again and the data is finally done, in practice we don’t care about those earlier promises resolving since the downstream work has already been invalidated by the new streaming data.

We only ever deal with one promise though, and the idea is to avoid unnecessary computation at the cost of delaying the display of results. Otherwise we'd create new DuckDBClient instances for an increasingly larger array (at least that's how I understand it).

mbostock commented 1 year ago

The problem is that the only thing that can trigger resolving is a subsequent call to loadDataSource, which then checks if source.done is now strictly not false. So if you call loadDataSource once when source.done is false, you get a pending promise, but it’ll never resolve unless someone calls loadDataSource again in the future when source.done is not false.

mootari commented 1 year ago

Yes, if I understood @Fil correctly the intention was to fix the case for when another DatabaseClient conforming data source is used, in which case we'd expect there to be a last yield with .done set to true.

mbostock commented 1 year ago

It would fix the problem with our current usage, but I don’t think it’s robust so I’d like to look for alternative solutions.

mootari commented 1 year ago

@mbostock I feel I'm not familiar enough with the internal data table APIs to suggest a solution. Will leave this problem to @libbey-observable and you (and hopefully learn a lot in the process 🙏🏻 ).

mootari commented 1 year ago

Closing in favor of #329.

Fil commented 1 year ago

This discussion has merit, beyond the urgent need to fix. And the pairing session was enlightening!