Closed sangensong closed 2 years ago
Hi. I don't think it has anything to do with the columnar library since without it it behaves similarly:
mysql> create table idx(title text, type int, price float);
Query OK, 0 rows affected (0.00 sec)
mysql> show create table idx;
+-------+------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------+
| idx | CREATE TABLE idx (
type integer,
price float,
title text
) |
+-------+------------------------------------------------------------+
1 row in set (0.00 sec)
If you believe it's a problem please create an issue in https://github.com/manticoresoftware/manticoresearch/issues and explain what's bad in this behaviour.
Thank you for your reply. My usage scenario is like this. I have a lot of csv files that have been generated, and these csv files are still being generated. At present, I have inserted these csv files into clickhouse. Now, I want to perform full-text search on my data. Due to the huge amount of data, I feel that it is not suitable to use clickhouse for full-text search. At this time, I found your project, and I want to use your project to do it. Full Text Search. Because the field correspondence of my csv file has been determined, I see that the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore. Also, I would like to ask, does a command like clickhouse exist in manticore
cat a.csv | clickhouse-client --database=db --query="insert into table test format CSV"
My create table sql like this:
create table t_log_http (
log_time bigint,
uuid bigint,
src_mac text stored,
dst_mac text stored,
src_location text stored,
dst_location text stored,
app_protocol text stored,
app_name text stored,
is_ipv6 bit(8),
src_ip text stored,
dst_ip text stored,
ip_protocol text stored,
src_port bit(16),
dst_port bit(16),
method text stored,
uri text stored,
host text,
url_category text stored,
referer text stored,
user_agent text stored,
xff text stored,
subject text stored,
rsp_code text stored) engine='columnar';
I currently don't see the way I want on the official documentation, so what I think is to use a method like the following to insert data, but this execution efficiency is too low
cat t_log_http.csv|awk -F "," '{printf("insert into t_log_http values(%s,%s,'\''%s'\'',%s,%s,%s,%s,%s,%s,%s,%s,%s,'\''
%s'\'',%s);\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14)}' > t_log_http.sql
➤ Sergey Nikolaev commented:
the order of the fields seems to be adjusted when I create the table, and I am afraid that these csv files cannot be inserted into the manticore
When you do INSERT INTO tbl VALUES(...)
in my opinion it's always good to specify the fields list explicitly, e.g. insert into t_log_http(log_time, uuid, ...) values(...)
.
There's no tool like clickhouse-client
which can accept data at STDIN and convert it into INSERT INTO
. Just simple awk may be suboptimal since it inserts data line by line, which may be not very fast, but it's the right direction. I recommend two ways:
csvpipe
. Interactive course: https://play.manticoresearch.com/csv/. Docs: https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_CSV,TSV The reason I don't use plain mode is that I see on the docs
What you cannot do with a plain index:
In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?
full text fields and attributes are different types and all statements like desc
show create table
and so on iterate attributes then fields that is why there is no original order, ie attributes and fields are not mixed.
In my usage scenario, I will continuously generate csv files and need to continuously insert data. Is the plain index mode still appropriate in this case?
Continuous data processing is also possible with plain indexes, since you can:
RELOAD INDEXES
(https://mnt.cr/reload%20indexes) to sync with the configetc.
But it's easier to just use an RT index. I gave you a script example, if you know php it shouldn't be a big deal to make it do what you want. Eventually we have in plans to implement some manticore-client
(similar to clickhouse-client
), but unfortunately it's only plans yet. Pull requests are very welcome!
I am very puzzled as to why the above two are different. Is there any adjustment in the manticore?