rotten / rethinkdb-multicorn-postgresql-fdw

Multicorn based PostgreSQL Foreign Data Wrapper for RethinkDB
MIT License
20 stars 5 forks source link

Invalid token when parsing nested objects into JSON #2

Closed segphault closed 9 years ago

segphault commented 9 years ago

I'm testing this foreign data wrapper with my Cats of Instagram project. I use the following sql to create the foreign table:

create foreign table instacat (id varchar, images json, time timestamp) server rethink options (table_name 'instacat');

The value of the images property in the RethinkDB document is a nested object that looks sort of like this:

"images": {
    "low_resolution": {"height": 306, "url": "...", "width": 306},
    "standard_resolution": {"height": 612, "url": "...", "width": 612},
    "thumbnail": {"height": 150, "url": "...", "width": 150}
}

When I perform a select (select * from instacat) I get the following error:

ERROR:  invalid input syntax for type json
DETAIL:  Token "=" is invalid.
CONTEXT:  JSON data, line 1: "low_resolution"=...

When I use the varchar type instead of json for the images property, I see text that looks like this:

"low_resolution"=>"url"=>”...", "width"=>306, "height"=>306, "thumbnail"=>"url"=>”…", "width"=>150, "height"=>150, "standard_resolution"=>"url"=>”...", "width"=>612, "height"=>612

Which is obviously not valid JSON. I think that the error I'm getting when using the JSON type occurs because it's trying to parse that content with the "=>" symbols. I'm not sure how it's getting transformed into that format, though.

rotten commented 9 years ago

Thanks! I'll look into it.

rotten commented 9 years ago

Ok, so it looks like Multicorn is translating the Python dictionary type returned by the RethinkDB library into that funky string with the "=>" in it, instead of translating it into the more traditional JSON-like representation of the dictionary type.

One solution is to go through every row being returned from RethinkDB, and check the type of each column. If it finds a column that is a dictionary type, it will do a json.dumps() on it to convert it before passing it all back to PostgreSQL.

I've asked on the Multicorn list first, however, in case there is a better way.

This actually slipped through my initial testing because I had a typo in the test cases I was running (!).

Stay tuned.

rotten commented 9 years ago

I opened Issue #86 with the Multicorn folks for this. It looks like I'll have to add a decorator that checks each column type and serializes the dictionary types with json.dumps() before returning them to Multicorn. I should be able to get that in place this week.

rotten commented 9 years ago

I just merged some quick changes that should fix this issue. Let me know if it still doesn't work right for you!