observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
970 stars 83 forks source link

Could DuckDBClient load (CSV) files by URL ? #379

Open ericemc3 opened 1 year ago

ericemc3 commented 1 year ago

I'd like an equivalent of gentoo = d3.csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381", d3.autoType)

with DuckDBClient.

both

DuckDBClient.of({
    gentoo: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
})

and

DuckDBClient.of({
    gentoo: {
      file: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
    }
})

won't work.

But this, simulating a FileAttachment structure, will work:

db =  {
  const gentoo = {
    url : () => "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381",
    mimeType: 'text/csv',
    name: 'gentoo'
  }

  return DuckDBClient.of({
    gentoo: {file: gentoo}
  })
}

although it's rather complicated to memorize.

I would dream of something simple and intuitive like:

DuckDBClient.of({
    gentoo: {
      url: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381",
      fileType: "csv"
    }
})

with fileType that could also be 'json' for instance (or 'parquet', 'arrow'...).

mbostock commented 1 year ago

That sounds reasonable to me. 👍

In theory, we could also make a HEAD request for the file to get the MIME type, and then we might be able to make the type optional if the content-type response header is present. That might allow this:

DuckDBClient.of({
  gentoo: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
})

Or this:

DuckDBClient.of({
  gentoo: {
    url: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
  }
})