qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.1k stars 66 forks source link

Overhaul Transform API #1905

Closed b5 closed 2 years ago

b5 commented 2 years ago

After numerous design discussions, we've arrived at a new API for running transforms. This commit makes a number of breaking changes at once:

An example conversion from old to new:

OLD:

load("http.star", "http")
---
def download(ctx):
  csvDownloadUrl = ctx.get_config("url")
  return http.get(csvDownloadUrl).body()
---
def transform(ds, ctx):
  # ctx.download is whatever download() returned
  csv = ctx.download
  # set the dataset body
  ds.set_body(csv, parse_as='csv')

NEW:

load("http.star", "http")
ds = dataset.latest()
---
csvDownloadUrl = config.get("url")
res = http.get(csvDownloadUrl).body()
---
ds.body = res
dataset.commit(ds)

BREAKING CHANGES: We've overhauled the starlark API to use a cell-based approach. All transform scripts will require updating

b5 commented 2 years ago

@dustmop, after a multi-hour rebase nightmare, I've got most of the changes we discussed re-worked atop the new dataframe api. tests in the cmd package aren't passing, but while I get those fixed up I'd love a review.