knizhnik / imcs

In-Memory Columnar Store extension for PostgreSQL
Apache License 2.0
203 stars 33 forks source link

load some char(64) data casue backend crash #10

Closed amutu closed 10 years ago

amutu commented 10 years ago

CREATE TABLE error4 ( ts timestamp without time zone, data character(64) );

postgres=# select data::bytea from test.error4;

data

\x0b2a2a2a212a2a2ac2a12a2a2ae480a12a2a2ae480a12a2a2ad0a12a2a2a202a2a2ac2a12a2a2a212a2a2ad0a12a2a2ae492a02a2a2ad0a12a2a2ae480a02a2a2ae482a02a2a2ae482a12a2a2ac2a02a2a2a (1 row)

postgres=# select length(data::bytea) from test.error4;

length

 82

(1 row)

postgres=# select length(data) from test.error4;

length

 64

(1 row)

postgres=# select data from test.error4;

data

\x0B_!_䀡_䀡 **¡_!_䒠_䀠_䂠_䂡 ** (1 row)

I find the len of columnar_store_load() get 82 instead of 64,this cause imcs_append_char() seg fault. str = (char*)vimVARDATA(t); len = VARSIZE(t) - VARHDRSZ; if (attr_type_oid[i] == BPCHAROID) { while (len != 0 && str[len-1] == ' ') { len -= 1; } } imcs_append_char(ts, str, len);-----!!!-here the len is 82,cause memset seg fault.

amutu commented 10 years ago

I think it is about some wide char,because: char_length(data) get 64,but octet_length get 82.

knizhnik commented 10 years ago

I have added check for too long string to avoid server crash in such cases. But you are right the source of the problem is that CHARACTER type in PostgreSQL by default corresponds to unicode character and IMCS stores bytes. One of the possible workarounds is to increase size of field:

CREATE TABLE error4 ( ts timestamp without time zone, data character(100) );

It will not have any influence on storing this data in PostgreSQL (since all strings are stored as varying length data in any case), but IMCS will use larger element size and o your string will fit in it.

Another solution will be to automatically multiply size of type on maximal number of bytes needed to represent wide character. But this multiplier can be quite larger - some exotic Unicode character requires more than 4 multibytes characters. So I do not like this idea.

amutu commented 10 years ago

thanks for your explainning,I will increase the char type.I think this ticket can be closed.