okfn / webstore

webstore is a web-api enabled datastore backed onto sql databases especially sqlite. It supports the RESTful JSON APIs standard to nosql dbs like CouchDB, MongoDB, Riak but with the power, reliability and familiarity of SQL and RDBMS.
http://wiki.ckan.org/Webstore
39 stars 7 forks source link

Allow nested 'objects' (and lists) in webstore columns via jsonification #25

Open rufuspollock opened 13 years ago

rufuspollock commented 13 years ago

Suppose I have data like:

{
   'geolocation': {
      'long': ...
      'lat': ...
   }
}

At the moment no way to save this into the webstore. We should have webstore auto json encode the geolocation field and save as a string. Obviously need to unencode on way out so suggest doing this via setting type of field to 'json' (if that is possible) and deserializing on the way out. Suggest doing this for hash and list type.

pudo commented 13 years ago

The internals of this are clear: adapt the JSONType column type from CKAN to automatically wrap and un-wrap data into the database when the value is a list or mapping type in the source data.

Exposing this data via JSON is also trivial, but exporting it into any other type (CSV, XLS, HTML) would require some additional magic. Two options here include:

This also raises the larger question of how broad the use case for webstore is: do we focus on mostly tabular data or try to extend the system to fully support more complex document-like data? IMO there is a large use case for tabular data alone and the constraint allows us to be specific: things like distinct, re-using parts of SQL as query syntax etc. Plus its simple to understand, which is not true of a mixed tabular/document store.

rufuspollock commented 13 years ago

Definitely against option 1. I think option 2 (bullet point 2) would be better and would work particularly well in context of #27 (setting column types). Remember with csv we already have problem of guessing column types (which is pretty important).

On last question not sure i agree. Distinct can just be ignored for columns that are json (cf #27 again) and there will in any case be columns we do not want distinct run on (e.g. free text).

For me this is a very important feature for e.g. use for community dashboard (which seems reasonable). I hear your point about focusing on doing one thing well but I think if we had column types (#27) and had additional json types like hash / dict (or just json) we'd could have best of both worlds (i.e. tabular/sql functionality but extensible to nested / complex data).

goatchurchprime commented 13 years ago

Putting a compound structure into a database column destroys the ability to do anything with it. We might as well have a single column in every table called "value" which is an encoded dict of the data and lock out every ordering and filtering function from happening.

This particular example of a geolocation item wrapping lat and long values is egregious as it prevents the use of the r*tree index I have been looking forward to exercising: http://www.sqlite.org/rtree.html

frabcus commented 13 years ago

Rufus - are you no longer thinking about a relational SQL store? What query language and indexing would you want on these recursive JSON objects?

There are really two choices I can think of - 1) SQL, or an SQL like language, for which you can't have the recursion, 2) recursive documents, and some other query language like MongoDB or something new and higher level above it.

One of the principles I think that we should definitively agree or disagree on is that Webstore is about SQL. If it is not about SQL, then we're doing something much harder and very different - creating a new query language. That's an interesting project, but I'd hoped was outside the scope of a practical datastore for people to use now.