manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.98k stars 499 forks source link

Support Apache Superset with MySQL connector #1049

Closed cold01water closed 1 year ago

cold01water commented 1 year ago

Is your feature request related to a problem? Please describe.
I want to be able to operate manticoresearch from Apache Superset.
Currently it doesn't work because the query is missing or the format is different from MySQL.

Describe the solution you'd like
Implement missing queries in manticore-buddy.
Or develop a Python library that can be connected with SQL so that you can connect to Manticoresearch from Python's sqlalchemy.

Additional context
I added the missing query in manticore-buddy and confirmed it works. It's still a work in progress, but I'll leave you with the information.

missing queries

Below is the missing MySQL query and Manticore alternative query.

mysql query: manticore query

So far, we have added the database connection settings to the superset.
However, I can't proceed to the next phase because I can't use show create table DB.table query.

Although show create table [index] is implemented in Manticore,
an error occurs because the DB.table format is not supported.

I'm having a hard time mapping this query.

I will continue to challenge.
I would appreciate it if someone could help me.

Also, manticore-buddy is very difficult because it has to change various files to implement one simple query. It would be nice if the query and data mapping could be put together...

donhardman commented 1 year ago

Hi there! We are improving the approach and docs for Manticore Buddy for a better "pluggable" design. But for now, I can help you with the struggles you faced while developing the features. Feel free to ask or suggest anything you find hard to code in Buddy.

You are right; Manticore supports show create table [table] but not show create table db.[table]

In that case, all you need to do is make the ShowCreateTable handler in Buddy and parse the input request by removing the db with regexp, and ### sending the modified show to create table request to manticore back. So it will allow you to support queries like this also.

I recommend you take a look at show full tables implementation; the case is almost the same – https://github.com/manticoresoftware/manticoresearch-buddy/tree/feature/mysqldump/src/ShowFullTables

cold01water commented 1 year ago

I know it could be done, but buddy's latest code has changed. It is difficult to test the development continuously, so we will try again when it is stable.

donhardman commented 1 year ago

You are right, and we are working on a better implementation of extendability. It should not take longer than one week when will finish the pluggable approach for extensions of functionality. I will ping you here and let you know when we finish it. So you can continue to implement and feel safe modifying core processing.

Thanks for your patience and for letting us know about your struggles; working on it :)

donhardman commented 1 year ago

Hi there! I hope you're doing well! We're excited to let you know that we've recently migrated to a pluggable system, making it much easier to extend functionality with PHP.

I highly recommend checking out the following articles:

Also, please remember that the current implementation of the Buddy pluggable design is available in the development version and will be included in an upcoming release. Stay tuned!

donhardman commented 1 year ago

Hi there!

We have completed the work on adding Apache Superset support to Manticore! 🎉

You can try it by using Manticore's latest development packages. Follow the instructions here: https://manual.manticoresearch.com/Installation/RHEL_and_Centos#Installing-Manticore-packages-on-RedHat-and-CentOS

cold01water commented 1 year ago

That's very good news.

I've been busy lately and haven't had time for manticore. I thought that buddy's plug-in repository was divided and various improvements were made. However, I didn't expect to support supersets so soon. Or rather, I thought I had to do it myself. i will try.

donhardman commented 1 year ago

Please let us know how it works.

Additionally, kindly note that the time grain feature should function for all options except Week and Quarter. We are currently addressing this issue, and you can monitor its progress here: https://github.com/manticoresoftware/manticoresearch/issues/1156.

Feel free to share the results of your testing with us.

cold01water commented 1 year ago

I just added the DB in superset and checked with SQL Lab and it is working. I tried the following flow that I wanted to do and had a lot of problems and have not tried the visualization yet.

I leave what I did as reference information. This is not a criticism.

This is all I have so far.

It would be nice if Elasticsearch's API's handling of GET parameters and auto_schema were more accurate.

sanikolaev commented 1 year ago

Thank you for your feedback, @cold01water We haven't yet tested Vector.dev, and fluent bit/fluentd. We've been focusing on Logstash and Beats. It's great Fluent Bit is already working. We'll test it better though.

We'll also think about the timestamp issue.

tomatolog commented 1 year ago

Register data from vector (alternative logstash/fluend) to Manticore Manticore fails to work because vector adds a GET parameter /_bulk?timeout=60s

_bulk should be a POST but I tried all variants and all work fine for me

>curl -iX GET "localhost:9312/_bulk" -H "content-type: application/x-ndjson" --data-binary @bulk_small1.js
HTTP/1.1 200 OK
Server: 6.0.5 8a085ef6c@230524 dev
Content-Type: application/json; charset=UTF-8
Content-Length: 230

{"items":[{"index":{"_index":"weblog","_type":"doc","_id":"4292832878853508535","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}],"errors":false,"took":1}
>curl -iX GET "localhost:9312/_bulk?timeout=60s" -H "content-type: application/x-ndjson" --data-binary @bulk_small1.js
HTTP/1.1 200 OK
Server: 6.0.5 8a085ef6c@230524 dev
Content-Type: application/json; charset=UTF-8
Content-Length: 230

{"items":[{"index":{"_index":"weblog","_type":"doc","_id":"4292832878853508535","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}],"errors":false,"took":1}
>curl -iX POST "localhost:9312/_bulk?timeout=60s" -H "content-type: application/x-ndjson" --data-binary @bulk_small1.js
HTTP/1.1 200 OK
Server: 6.0.5 8a085ef6c@230524 dev
Content-Type: application/json; charset=UTF-8
Content-Length: 230

{"items":[{"index":{"_index":"weblog","_type":"doc","_id":"4292832878853508535","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}],"errors":false,"took":1}

it could be better to provide the query that causes the error along with error reply to reproduce this case here locally.

Or provide Docker container that could be started along with Manticoresearch and all cases could be investigated there.

cold01water commented 1 year ago

The result is very strange. Is the version of buddy different?

I tried different patterns and realized that the GET parameter was the cause. The most obvious example is the following. This is a query from the Manticore manual. It is tested with and without the GET parameter.

curl -v -XPOST '127.0.0.1:9308/_bulk?timeout=60s' -H 'Content-Type: application/x-ndjson' -d '
{ "index" : { "_index" : "products2" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "_index" : "products2" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
'
*   Trying 127.0.0.1:9308...
* Connected to 127.0.0.1 (127.0.0.1) port 9308 (#0)
> POST /_bulk?timeout=60s HTTP/1.1
> Host: 127.0.0.1:9308
> User-Agent: curl/7.76.1
> Accept: */*
> Content-Type: application/x-ndjson
> Content-Length: 172
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: 6.0.5 60c306ea1@230612 dev (columnar 2.0.5 95b63e2@230612) (secondary 2.0.5 95b63e2@230612)
< Content-Type: application/json; charset=UTF-8
< Content-Length: 89
< 
* Connection #0 to host 127.0.0.1 left intact
[{"total":0,"warning":"","error":"table 'products2' absent, or does not support INSERT"}]
curl -v -XPOST "127.0.0.1:9308/_bulk" -H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "products2" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "_index" : "products2" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
*   Trying 127.0.0.1:9308...
* Connected to 127.0.0.1 (127.0.0.1) port 9308 (#0)
> POST /_bulk HTTP/1.1
> Host: 127.0.0.1:9308
> User-Agent: curl/7.76.1
> Accept: */*
> Content-Type: application/x-ndjson
> Content-Length: 172
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: 6.0.5 60c306ea1@230612 dev (columnar 2.0.5 95b63e2@230612) (secondary 2.0.5 95b63e2@230612)
< Content-Type: aMy environment is:pplication/json; charset=UTF-8
< Content-Length: 396
< 
* Connection #0 to host 127.0.0.1 left intact
{"items":[{"index":{"_index":"products2","_type":"doc","_id":"0","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"create":{"_index":"products2","_type":"doc","_id":"3","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}],"errors":false,"took":1}

My environment is:

OS: Almalinux9 Manticore:

manticore-repo-0.2.0-1.noarch
manticore-icudata-5.0.3_221123.d2d9e5e56-1.el9.noarch
manticore-executor-0.7.1_23042814.46b67b8-1.x86_64
manticore-devel-6.0.5_230612.60c306ea1-1.el9.noarch
manticore-common-6.0.5_230612.60c306ea1-1.el9.noarch
manticore-server-core-6.0.5_230612.60c306ea1-1.el9.x86_64
manticore-server-6.0.5_230612.60c306ea1-1.el9.x86_64
manticore-columnar-lib-2.0.5_230612.95b63e2-1.el9.x86_64
manticore-buddy-1.0.7_23061114.91aee20-1.noarch
manticore-backup-1.0.5_23051715.bd59a44-1.noarch
manticore-tools-6.0.5_230612.60c306ea1-1.el9.x86_64
manticore-6.0.5_230612.60c306ea1-1.el9.x86_64
manticore-extra-0.7.1_23042813.46b67b8-1.noarch

More to the point, I read the buddy-core code. Now with the latest code, it doesn't look like _bulk's GET parameter is being handled well. Please refer to the other issue I opened yesterday.

manticoresoftware/buddy-core#13

Please let me know if there's anything I've overlooked or if there's anything I can do.

sanikolaev commented 1 year ago
curl -v -XPOST '127.0.0.1:9308/_bulk?timeout=60s' -H 'Content-Type: application/x-ndjson' -d '
{ "index" : { "_index" : "products2" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "_index" : "products2" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }

...

[{"total":0,"warning":"","error":"table 'products2' absent, or does not support INSERT"}]

We know what may be the reason of this. Looking further.

Nick-S-2018 commented 1 year ago

We've created an issue about this: https://github.com/manticoresoftware/manticoresearch/issues/1168

unixfox commented 1 year ago

I do confirm that due to the query string ?timeout=60s vector doesn't seem to work with manticore.

Vector config (toml):

[sources.dummy_logs]
type = "demo_logs"
format = "syslog"
interval = 1

# Parse Syslog logs
# See the Vector Remap Language reference for more info: https://vrl.dev
[transforms.parse_logs]
type = "remap"
inputs = ["dummy_logs"]
source = '''
. = parse_syslog!(string!(.message))
'''

# Print parsed logs to stdout
[sinks.print]
type = "console"
inputs = ["parse_logs"]
encoding.codec = "json"

[sinks.es]
type = "elasticsearch"
inputs = ["parse_logs"]
endpoint = "http://localhost:9308
bulk.index = "test-%F"