marijnh / Postmodern

A Common Lisp PostgreSQL programming interface
http://marijnhaverbeke.nl/postmodern
Other
397 stars 89 forks source link

handling jsonb #170

Open erjoalgo opened 6 years ago

erjoalgo commented 6 years ago

Is there a recommended way to handle jsonb types? Currently they appear to be handled as strings in the dao, but I'd like to transform jsonb to/from a more usable lisp type, perhaps via cl-json.

I'd like to possibly avoid having to create a separate class just to transform jsonb.

sabracrolleton commented 6 years ago

Open to suggestions. The biggest issue is dealing with the multitude of lisp json libraries that all have their own idea of the right structure to use for json. See, e.g. https://sites.google.com/site/sabraonthehill/home/json-libraries#TOC-Some-interesting-results-from-dealing-with-queries-via-postmodern-against-a-postgresql-database-having-jsonb-objects-stored-in-the-database-update-23-December-2017-

I just recently imported a 50MB json datafile only to discover that it literally had no unique identifiers. Cl-json works because it pushes json objects into an alist. Yason does not work because it pushes json objects into a hash and there was nothing to hash against.

erjoalgo commented 6 years ago

I was originally thinking more along the lines of being able to register a method for defining a custom transformation to/from the dao for a given column or column type, but I'm not sure whether that makes sense and whether it would be consistent with the DAO pattern without adding too much complexity. Perhaps what I'm really after is an additional data object representation at service layer, keeping the DAO "dumb" and simple.

I have come across that comparison page before and I'm aware of the several json libraries, I can see the challenge in picking the "right" one. I've only used drakma and cl-json and don't even have a preference among those two.

anarchodin commented 6 years ago

I just recently imported a 50MB json datafile only to discover that it literally had no unique identifiers. Cl-json works because it pushes json objects into an alist. Yason does not work because it pushes json objects into a hash and there was nothing to hash against.

JSON objects need to have distinct keys to be interoperable. PostgreSQL, for instance, disregards everything except the last key specified. (json keeps them but ignores them, jsonb throws them away). All three of these behaviours are within spec.

To get back on topic, I'd suggest that handling jsonb as a special case is not a good idea, because there is a lot of applications of PostgreSQL that make extensive use of special datatypes. Pretty much all of those would benefit from having a general PG-to-Lisp-and-back translation mechanism, which is more or less what you need to handle jsonb properly. That kind of extensibility is also the hallmark of both Postgres and Lisp, so it'd be a little strange not to strive for it in a Lisp library for PostgreSQL. Besides, isn't there a skeleton for PG-to-Lisp translation in some layer of postmodern already?

sabracrolleton commented 6 years ago

At the moment, postmodern imports json objects from pg as a string, then any of the various lisp json libraries can read from the string into whatever lisp datastructure they use for json. Going back the other way also goes through a string common denominator. When you suggest a special datatype inside postmodern for json objects, I am trying to understand the benefits. Consider the following:

I pulled out the file of json data I mentioned earlier just because it was handy, I dumped it into a postgresql table with a serial id and put the json in an indexed field named "data". Postgresql was not bothered at all by the lack of unique names. After indexing, queries worked as expected and "interoperability" is not actually necessary.

A simple boring sql query in psql:

select data from historical_events where data->> 'category2' = 'Poetry';

results in:

{"date": "-23", "lang": "en", "category1": "By topic", "category2": "Poetry", "description": "The Roman poet Horace publishes the first three books of Odes.", "granularity": "year"}

Translating that into postmodern's s-sql

(with-test-connection
 (query (:select 'data
                 :from 'historical-events
                 :where (:= (:->> 'data "category2") "Poetry"))
        :single )) 

Returns a string:

"{\"date\": \"-23\", \"lang\": \"en\", \"category1\": \"By topic\", \"category2\": \"Poetry\", \"description\": \"The Roman poet Horace publishes the first three books of Odes.\", \"granularity\": \"year\"}"

Parsing that string using each of seven different common lisp json libraries:

cl-json:decode-json-from-string returns an alist

((:DATE . "-23") (:LANG . "en") (:CATEGORY-1 . "By topic")
 (:CATEGORY-2 . "Poetry")
 (:DESCRIPTION
  . "The Roman poet Horace publishes the first three books of Odes.")
 (:GRANULARITY . "year"))

st-json:read-json returns an st-json:jso object

#S(ST-JSON:JSO
   :ALIST (("date" . "-23") ("lang" . "en") ("category1" . "By topic")
           ("category2" . "Poetry")
           ("description"
            . "The Roman poet Horace publishes the first three books of Odes.")
           ("granularity" . "year")))

yason:parse returns a hash table #<HASH-TABLE :TEST EQUAL :COUNT 6 {100909FAE3}>

json:parse returns a list

(:OBJ ("date" . "-23") ("lang" . "en") ("category1" . "By topic")
 ("category2" . "Poetry")
 ("description"
  . "The Roman poet Horace publishes the first three books of Odes.")
 ("granularity" . "year"))

json-streams returns a list

(:OBJECT ("date" . "-23") ("lang" . "en") ("category1" . "By topic")
 ("category2" . "Poetry")
 ("description"
  . "The Roman poet Horace publishes the first three books of Odes.")
 ("granularity" . "year"))

jonathan:parse throws a nil array accessed error

com.gigamonkeys.json:parse-json returns a list

("date" "-23" "lang" "en" "category1" "By topic" "category2" "Poetry"
 "description" "The Roman poet Horace publishes the first three books of Odes."
 "granularity" "year")

So that is seven lisp json libraries each with their own idea of how to represent a json object. Postmodern users could be using any of those seven libraries or their own versions. Can you expand on the benefits of having a special datatype in postmodern for json instead of using a string that all the lisp json libraries (well, almost all the libraries) can read and write?

erjoalgo commented 6 years ago

(Not sure if reply is to me or to anarchodin)

I agree it would be unfair and unnecessary for postmodern to force all its users into a particular cl<=>json implementation, and I agree that string as a common denominator is simple, general and flexible enough.

However, I do still need to convert the string representation of json into something usable, whether for manipulation in lisp or for serving objects to web clients (e.g. I don't want my JS client to receive a string where there should be a list or an object).

Maybe my query should be rephrased to: what is a recommended way of transforming jsonb fields into something usable?

One approach that came to mind based on a certain pattern is to define an additional class that is at a higher level of abstraction from the low-level DAO class, where the JSON is unpacked, and to manually define a transformation between the two.

The approach I'm currently to avoid the overhead of an additional class is to use cl-json:decode-from-string and cl-json:encode-to-string at the database layer (my lisp package consuming postmodern and exposing an application-level API to talk to the DB), looping over all the jsonb slots to apply the to/from transformation in-place. So instead of defining an additional class, I apply this mutation somewhat transparently at my database-layer lisp package. It looks something like this:

(defmacro slot-value-> (obj slots)
  (if slots
      `(slot-value-> (slot-value ,obj ',(car slots)) ,(cdr slots))
      obj))

(defun dao-json-transform (product to-json?)
  (loop for slot in
       (sb-mop:class-slots (find-class 'product))
     as slot-name = (slot-value slot 'sb-pcl::name)
     as val = (slot-value product slot-name)
     as db-type = (slot-value-> slot
                                (POSTMODERN::DIRECT-SLOT POSTMODERN::COL-TYPE))
     do
       (setf (slot-value product slot-name)
             (cond
               ((if (consp db-type)
                     (member 'JSONB db-type)
                     (eq 'JSONB db-type))
                (if to-json?
                    (unless (eq :NULL val)
                      (cl-json:decode-json-from-string val))
                    (cl-json:encode-json-to-string val)))

               ((if to-json? (eq :NULL val) (null val))
                (if to-json? nil :NULL))
               (t val))))
  product)

But my approach looks like a hack. So I was wondering, can I use this approach while making it less of a hack? Or is there a better alternative?

sabracrolleton commented 6 years ago

Was actually replying to anarchodin. To your question, are you sure you even want to use the dao in the first place? It pulls everything in the row of a single table, so no joins and probably more data in the row than the client actually wants if they are accessing through an api. Maybe PM me with how you respond to an api request ( I assume you are sending binary data) and what the table structure looks like?

anarchodin commented 6 years ago

I'm a little confused, because I was trying to say there shouldn't be a special type for JSON. In order to implement a jsonb-to-lisp mapping you'd have to build an interface that would also be useful for, say, transparent translation of timestamptz to local-time object, or from PostGIS geometries into a Lispy representation - and exposing that interface would be far more extensible than just building a jsonb type.

EDIT: Also, what's the structure of this datafile you're talking about? I assumed, because of the "unique identifiers" reference, that the entries were inside an object, but that doesn't seem to be the case. There's no need for any kind of unique identifiers anywhere else in JSON, so it's strange that the lack thereof would cause problems.

sabracrolleton commented 6 years ago

Anarchodin: Agree with you that there should not be a special type in postmodern for json. What I was using those examples for was trying to try to figure out what that interface might be. See below for some dao thoughts.

The datafile I was handed had no structure other than simply being thousands of json objects inside one global object. I was complaining (not about anything you said) because my first thought was that I would use yason and have a hash of hashes. There being no unique identifiers, I could still use yason at the individual object level, but not make a hash of hashes. I switched to cl-json and made a list of alists instead.

erjoalgo: Assuming you do want to use daos, then I think a more elegant solution could be setting an after method on initialize instance. To give a simple example, consider the following and then think about how it could be improved:

(defclass historical-events ()
  ((id :accessor id :col-type integer :initarg :id :initform nil)
   (data :accessor he-data :col-type jsonb :initarg :data :initform nil))
  (:metaclass postmodern:dao-class)
  (:table-name historical-events)
  (:keys id))

(defmethod initialize-instance :after ((event historical-events) &key)
  (setf (slot-value event 'data)
        (cl-json:decode-json-from-string
         (slot-value event 'data))))

(he-data (with-test-connection (get-dao 'historical-events-2 14)))
((:DATE . "-298") (:LANG . "en") (:CATEGORY-1 . "By place")
 (:CATEGORY-2 . "Egypt")
 (:DESCRIPTION
  . "Ptolemy gives his stepdaughter Theoxena in marriage to Agathocles, the tyrant of Syracuse (in south-eastern Sicily).")
 (:GRANULARITY . "year"))

So we pulled a record from the database in a dao. Initially postmodern has it as a string, but then the after method calls cl-json:decode-json-from-string to translate into an alist. At this point, the dao slot data with the accessor he-data has the information in the alist type that cl-json uses and you can do with it as you choose.

The date is "-298" and suppose we want that changed to "-297".

(let ((item (with-test-connection (get-dao 'historical-events-2 14))))
  (setf (cdr (assoc :date (he-data item) :test 'eq)) "-297")
  (cdr (assoc :date (he-data item) :test 'eq)))
"-297"

Now we want to update the database. We cannot just use update-dao at this point because, as everyone has pointed out, postmodern wants to update json as a string and right now the json is in the form of an alist. So just before you use update dao you would have to use cl-json to convert the json back to a string.

(defun save-historical-event-dao (item)
  (setf (he-data item)(cl-json:encode-json-to-string (he-data item)))
  (update-dao item))

Now we have a save function that does this for us automatically. To test we pull the same event out of the database and change the date to "-295".

(with-test-connection (let ((item (get-dao 'historical-events-2 14)))
                      (setf (cdr (assoc :date (he-data item) :test 'eq)) "-295")
                      (save-historical-event-dao item)))

Check to see if it really changed:

(let ((item (with-test-connection (get-dao 'historical-events-2 14))))
                    (cdr (assoc :date (he-data item) :test 'eq)))
"-295"

There are probably better ways to do this, but this might give you a start. Obviously you still need to ensure the datatypes inside the json object converted correctly.

jdz commented 5 years ago

Pretty sure something similar to what cl-postgres-plus-uuid and cl-postgres+local-timedoes can be done with json and jsonb column types.

sabracrolleton commented 5 years ago

Certainly. The user can choose their preferred lisp json library and write a reader and to-sql-string method to match whatever data structure that library uses. I agree that would be more elegant than my solution.

sabracrolleton commented 5 years ago

Thoughts on a package that does something like this (using cl-json as an example)? There would have to be separate packages for each json library. The real question is whether to to-sql-string method based on a cons is a valid approach?

(defun set-jsonb-sql-reader (&optional (table cl-postgres:*sql-readtable*))
  (cl-postgres:set-sql-reader +jsonb+ #'cl-json:decode-json-from-string :table table))

(defun alistp (alist)
  "Is a list an alist?"
  (when (listp alist)
    (and (listp alist)
         (every #'consp alist))))

(defun plistp (list)
  "Is a list a plist?"
  (when (listp list)
    (loop for x on list by #'cddr
       unless (and (keywordp (car x))
                    (cdr x))
       do (return nil)
       finally (return list))))

(defmethod cl-postgres:to-sql-string ((arg cons))
  (cond ((alistp arg)
         (cl-json:encode-json-alist-to-string arg))
        ((plistp arg)
         (cl-json:encode-json-plist-to-string arg))
        (t (cl-json:encode-json-to-string arg))))
sabracrolleton commented 3 years ago

See https://github.com/marijnh/Postmodern/pull/276 which may go some way to addressing the original issue by allowing transformational functions to be called when exporting from a dao to Postgresql or importing from Postgresql to a dao.