Open Fil opened 7 months ago
From https://github.com/observablehq/cli/pull/325#issuecomment-1842916117:
Tthe API for chaining loaders isn’t really about
FileAttachment
since data loaders can be written in any language. Instead a data loader needs to be able to fetch a file from the preview server (and we need an equivalent server during build). Maybe we set an environment variable which data loaders can read to know the address of the preview server. We’d also need to detect and error on circular dependencies ideally.
I've been working on a small project to get some first hand experience with the CLI. In it I'm downloading a zip file from some hobbyist website, extracting couple dozen text files, and then running them through a custom parser. Parsing them takes about 30 seconds, which is a bit longer than I want to do in the markdown file. I'm iterating on the parser itself, so I'm re-running it every few minutes.
The options I see for my case are:
From this, I have two wish list items. One is chained data loaders, the other is incremental data loaders that can somehow stream their results in to the client. I don't really know how that would work, and it's probably better suited for Notebooks anyways.
I can sympathize with wanting data loaders to be in any language, but it is very jarring to go from writing my code in a JS fenced code block script and having it work easily, to working in a .json.js
data loader and suddenly all of my imports break and I lose all of the nice tools I was using a moment ago. It makes me feel like to properly use Observable CLI I need to be fluent in three varieties of JS: Markdown code blocks, browser imports, and file attachments.
Perhaps once we have the server-based dataloader workflow that Mike mentioned, we could then wrap that in a FileAttachment facade that makes it feel just like it does in Markdown files.
@mythmon FileAttachment supports streaming, see https://observablehq.com/@observablehq/streaming-shapefiles
an example of the use case in the chess bump chart example. any changes to the data transformation in the data loader require a full download of all of the data.
This seems to point to something similar to having a dependency graph, like in the targets 📦 in R. And the dependency is not only for the data loaders but also for assets (computational cells are already covered, aren't they?)
Tangentially related to #918.
an example of the use case in the chess bump chart example. any changes to the data transformation in the data loader require a full download of all of the data.
I'm not seeing this implemented in the chess bump example. Am I missing something?
It's not implemented in the chess bump example. The example is a case where implementing this feature would improve the data loader(s), if Framework gained this features.
Gotcha. Here's my use case, for anyone interested.
I'd like Data Loader 1 to be a Python script that downloads a dataframe from s3, transforms the data with filter-y tricks and then writes out a JSON file that's ready to serve.
Then Data Loader 2 would be a Node.JS script that would open that very large JSON file, build a D3 graphic in a canvas object, and then write out a PNG file that could be ultimately served by the static site.
Suppose we want to 1. download a dataset from an API and 2. analyze it. Currently a data loader must do both at the same time, and will run the download part again if we update the analysis code.
Ideally we'd want to separate this into two chained data loaders, that would still be live, i.e. if a page relies on 2 that relies on 1, editing 1 would tell the page to require the analysis again, which would trigger a new download. But editing 2 would only run the analysis again, not the download.
This would also make it easier to generate several files from a common (and slow) download.