Open houzi1099857978 opened 3 years ago
I've created this branch: https://github.com/matthewfranglen/postgres-elasticsearch-fdw/tree/chinese which has a test for inserting Chinese. It passes.
There seems to be a problem with your system configuration for this as it is attempting to decode the string as ascii:
UnicodeDecodeError('ascii', '"\xe6\xb5\x8b\xe8\xaf\x95"', 1, 2, 'ordinal not in range(128)')
At the moment I don't really have a solid idea why your system is defaulting to decoding the string as ascii. I suggest looking at how the docker images differ to your system.
I think you should look into your locale settings. See locale.getdefaultlocale which may be the cause of your issue.
Inside the postgres docker container the $LANG
is en_US.utf8
.
I find the problem is not because i used chinese,it is couse by useing bytea type. when i insert into chinese to a bytea column,i will use the flowing sql to convert data to bytea by the flowing sql: INSERT INTO test_es(id,testbytea) values (1,'中文测试'::bytea); that will couse the problem.
What's the table definition?
What's the table definition?
CREATE EXTENSION multicorn;
CREATE SERVER multicorn_es FOREIGN DATA WRAPPER multicorn
OPTIONS (
wrapper 'pg_es_fdw.ElasticsearchFDW'
);
DROP FOREIGN TABLE IF EXISTS "public"."test_es";
CREATE FOREIGN TABLE test_es (
"id" int8 NOT NULL,
"num" int4,
"flag" bit(3),
"istrue" bool,
"createdate" date,
"price" numeric(5,2),
"price2" float8,
"datetime" timestamp(6),
"address" varchar(255) COLLATE "pg_catalog"."default",
"message" text COLLATE "pg_catalog"."default",
"testbytea" bytea,
"testjson" json,
"time" time(6)
)
SERVER multicorn_es
OPTIONS(
host '10.10.0.160',
port '9200',
index 'test_es',
type 'doc',
rowid_column 'id',
query_column 'query',
query_dsl 'false',
-- score_column 'message',
-- default_sort 'last_updated:desc',
-- sort_column 'id',
refresh 'false',
complete_returning 'false',
timeout '20',
username 'elastic',
password 'changeme'
)
;
INSERT INTO "public"."test_es" VALUES (1, 666, '111', 't', '2021-03-05', 11.01, 66.666, '2021-03-05 12:22:54', 'asdfa', 'adsfaaaa', '中文测试'::bytea, '{}', '11:27:00');
The elasticsearch schema would be great if you have that as well.
The elasticsearch version is 6.8.6,the index is created by default when i insert data
{
"state": "open",
"settings": {
"index": {
"creation_date": "1614943999235",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "30jZAEgwQLunvvH8A1FzGA",
"version": {
"created": "6080699"
},
"provided_name": "test_es"
}
},
"mappings": {
"doc": {
"properties": {
"address": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"flag": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"num": {
"type": "long"
},
"createdate": {
"type": "date"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"istrue": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"datetime": {
"type": "date"
},
"testbytea": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"price666": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"testjson": {
"type": "object"
},
"price": {
"type": "float"
},
"time": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"price2": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
},
"aliases": [],
"primary_terms": {
"0": 1,
"1": 1,
"2": 1,
"3": 1,
"4": 1
},
"in_sync_allocations": {
"0": [
"7UsFkimCRji0kviOgKk7gg",
"srAnowRfQBSmlzSTGge6bA"
],
"1": [
"OhjG7VQ2Tm2l5VKU3c7IZw",
"w_y52WfjTuGU-p7unXTQ6w"
],
"2": [
"rpfDNJheRjSdLpg-ZYkxMg",
"YT1fVCEZT1mrdNyFrqnSbg"
],
"3": [
"8OCOll5qQjyyJ11pgyczzw",
"Osh_2xoQRgm_zbhNArTTvQ"
],
"4": [
"hsy1xN8aRDqBNhex3kjqNA",
"yzp3uT-OSlWPRFdvIhCbKA"
]
}
}
That's great, thank you.
I wonder if the problem is this field definition:
"testbytea": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
It looks like it should have a type of binary
. The current fdw would not work with it as it needs to encode the data as base64. I think that even with that change the schema will need to be created manually rather than using the automatic creation because the elasticsearch definition would still be text.
I'll see what I can do about fixing this on the postgres side.
I wonder if the problem is this field definition:
"testbytea": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } } }
It looks like it should have a type of
binary
. The current fdw would not work with it as it needs to encode the data as base64. I think that even with that change the schema will need to be created manually rather than using the automatic creation because the elasticsearch definition would still be text.I'll see what I can do about fixing this on the postgres side.
I change the es index type to binary,but the problem still exsist,maybe it is caused by php conventer.
The problem is in the PG FDW. Elasticsearch expects the data to be base64 encoded.
I need to add a data conversion layer for the more complex types, which I've been putting off. Doing that should also allow me to address the GeoField ticket. Sorry about the delay.
Thanks for help,I find a way to solve the problem. when I login python print the default system encoding is ascill
linux_dtq4:~ # python
Python 2.7.13 (default, Jan 11 2017, 10:56:06) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.getdefaultencoding()
ascii
So I changed the system encoding by create a sitecustomize.py in python Lib\site-packages:
cd /usr/lib/python2.7/site-packages/
vi sitecustomize.py
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
reboot the system,it works!
when i try to insert into chinese,i got zhe flowing error: