simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.06k stars 646 forks source link

Linked Data(sette) #412

Open sfkeller opened 5 years ago

sfkeller commented 5 years ago

I've a radical feature idea (possible first as an extension in order to experiment?):

I'd like to link to a remote table from a remote database, e.g. with a function "linked_datasette()". So one could do following query:

SELECT foo.id, foo.a, remote_party.b
FROM foo
JOIN linked_datasette("https://parlgov.datasettes.com/parlgov-b42a2f2") AS remote_party 
  ON foo.id=remote_party.id

This is inspired by SPARQL's SERVICE keyword for remote RDF "endpoints".

There's a foundation in the SQL Standard called SQL/MED (https://rhaas.blogspot.com/2011/01/why-sqlmed-is-cool.html ).

And here's an implementation from me in Postgres FDW to connect another Postgres "endpoint": https://pastebin.com/Fz2v64Cz .

simonw commented 5 years ago

I've been thinking about how Datasette instances could query each other for a while - it's a really interesting direction.

There are some tricky problems to solve to get this to work. There's a SQLite mechanism called "virtual table functions" which can implement things like this, but it's not supported by Python's sqlite3 module out of the box.

https://github.com/coleifer/sqlite-vtfunc is a library that enables this feature. I experimented with using that to implement a function that scrapes HTML content (with an eye to accessing data from other APIs and Datasette instances) a while ago: https://github.com/coleifer/sqlite-vtfunc/issues/6

The bigger challenge is how to get this kind of thing to behave well within a Python 3 async environment. I have some ideas here but they're going to require some very crafty engineering.

psychemedia commented 5 years ago

Does this also relate to https://github.com/simonw/datasette/issues/283 and the ability to ATTACH DATABASE?