Closed ahocquard closed 4 years ago
Hi,
Thanks for opening this issue. This sounds like a good idea.
The code performs no explicit coercion of the data from the elastic search query, however as you have noticed it gets turned into varying string representations (in your example it is array->array and dictionary->hstore). When I looked up this in the multicorn I found this issue which confirms the behaviour you have experienced.
I think that the best thing I can do quickly is to ensure that the string representation of the json is always returned, which can then be cast into json or jsonb. For your specific use case it may require a view over the foreign table to apply a cast (when I've implemented this change I can test if this is required).
What do you think?
I also wonder what behaviour would be appropriate when writing to such a column.
I think that the best thing I can do quickly is to ensure that the string representation of the json is always returned, which can then be cast into json or jsonb
That would be very handy indeed.
In my case, if a jsonb
type does not work natively, I can still do:
CREATE TABLE IF NOT EXISTS access_log AS SELECT *, domain_names::jsonb as json_domain_names FROM es_access_log;
And if declaring my column as jsonb
instead of text
works automatically, great!
I also wonder what behaviour would be appropriate when writing to such a column.
A json representation would be the best no? Or I misunderstood what you mean.
With regard to writes - it's likely that the data provided via the multicorn library is always a string. That's something that I need to test. The elasticsearch instance would refuse anything that was not a completely formed json object consistent with the schema. Since my original use case for this was flat documents where columns were simple values this was not a problem. Ensuring that the write of a structured column is converted appropriately is the concern.
With regard to the casting of the output column I would remind you that there is no caching going on here. You would be paying the overhead of conversion to jsonb every time. I would recommend testing the performance of the two types carefully. A materialized view might help if you need to operate over the blob.
I should be able to work on this a bit tonight.
With regard to the casting of the output column I would remind you that there is no caching going on here. You would be paying the overhead of conversion to jsonb every time. I would recommend testing the performance of the two types carefully. A materialized view might help if you need to operate over the blob.
Yes, casting is done at runtime which is a barrier optimization in many cases. But in the previous example, I added a new jsonb
column to convert the text
column on the fly when importing data from ES. Then, in Postgres I can use the jsonb
column instead of the TEXT column. I only pay the cost of the cast at import time, which is very probably negligible.
Anyway, performance is not a big concern for my usecase .
I should be able to work on this a bit tonight. Thanks for the reactivity! If I can help somehow, tell me.
Oh this is a really great idea. I've created a jsonb branch for you to test out properly. My initial evaluation shows that for reading it works with the json/jsonb type applied to the foreign table directly.
I've listed instructions to reproduce my test run on the last commit. It would be good if you could check the branch version and report any issues you find, especially since you want to use this in anger.
Hello,
I launched my testsuite successfully with your branch! I had good hope as timestamp
and inet
types were automatically casted.
Also, it's possible to generate a valid json this way now (subquery mandatory):
select row_to_json(access_log) from (select * from es_access_log) access_log;
Thanks!
That's great :smile:
I need to add proper tests around this, implement the write behaviour and document it all. After that I'll release a new version. I'm hoping to get that done over the easter weekend.
Cool! Thanks a lot for this improvement :)
I've completed the test/documentation updates and released the new version.
If you have any further suggestions please let me know, otherwise I feel this issue can be closed.
Thanks for the new version (and for the support of Postgres 12). It works as expected!
Great stuff :smiley_cat:
Hello,
Is it possible to cast an array as json easily with the extension? Same thing for json objects. As far I know and by reading the Python code of the extension, it casts it as a Python string.
For example, the ES document:
The foreign table definition (JSON for
geoip
ordomain_names
does not work):What I would like is to declare it as
json
orjsonb
in order to be cast automatically by the wrapper. Did I miss something? Is it supported?