preset-io / elasticsearch-dbapi

A DBAPI and SQLAlchemy dialect for Elasticsearch
Apache License 2.0
106 stars 49 forks source link

SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0] #74

Open harshgadhia opened 2 years ago

harshgadhia commented 2 years ago

Hello preset-io team,

I am facing some issues with using elasticsearch-dbapi version: 0.2.6. I have described the issue in detail in the apache superset project too. SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0]

To summarize:

Issue 1

Issue 2

Environment

Any help from you is greatly appreciated.

harshgadhia commented 2 years ago

Please see here for the response formats for the Elasticsearch Opendistro SQL.

By default the plugin returns original response from Elasticsearch in JSON. Because this is the native response from Elasticsearch, extra efforts are needed to parse and interpret it.

I am pretty certain that the second issue is happening because ?format=jdbc is not set on the es query endpoint. See here the path is set to _opendistro/_sql instead it should be _opendistro/_sql?format=jdbc

However, the code assumes (See lines 280 and 281) the response in jdbc format. Hence, the error is happening in parsing the response. Can someone please fix this?

@dpgaspar: I am tagging you since, you are an active member of this project. Thanks in advance!

harshgadhia commented 2 years ago

In case you need the following:

pip freeze

 pip freeze
aiohttp==3.7.2
alembic==1.4.3
amqp==2.6.1
# Editable install with no version control (apache-superset==1.3.1)
-e /app
apispec==3.3.2
async-timeout==3.0.1
attrs==20.2.0
Authlib==0.15.5
Babel==2.8.0
backoff==1.10.0
billiard==3.6.3.0
bleach==3.3.0
boto3==1.16.10
botocore==1.19.10
Brotli==1.0.9
cached-property==1.5.2
cachelib==0.1.1
cachetools==4.2.4
celery==4.4.7
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
click==7.1.2
colorama==0.4.4
configparser==5.0.2
convertdate==2.3.0
cron-descriptor==1.2.24
croniter==0.3.36
cryptography==3.3.2
defusedxml==0.6.0
Deprecated==1.2.11
deprecation==2.1.0
dnspython==2.0.0
elasticsearch==7.13.4
elasticsearch-dbapi==0.2.6
email-validator==1.1.1
et-xmlfile==1.0.1
Flask==1.1.2
Flask-AppBuilder==3.3.0
Flask-Babel==1.0.0
Flask-Caching==1.10.1
Flask-Compress==1.8.0
Flask-Cors==3.0.9
Flask-JWT-Extended==3.24.1
Flask-Login==0.4.1
Flask-Migrate==2.5.3
Flask-OpenID==1.3.0
Flask-SQLAlchemy==2.4.4
flask-talisman==0.7.0
Flask-WTF==0.14.3
future==0.18.2
geographiclib==1.50
geopy==2.0.0
google-api-core==2.1.1
google-api-python-client==2.27.0
google-auth==2.3.1
google-auth-httplib2==0.1.0
googleapis-common-protos==1.53.0
graphlib-backport==1.0.3
gunicorn==20.0.4
holidays==0.10.3
httplib2==0.20.1
humanize==3.1.0
idna==2.10
idps-client==3.94.0
idps-mgmt-swagger-client==3.94.0
idps-sdk==3.94.0
idps-swagger-client==3.94.0
ijson==3.1.2.post0
importlib-metadata==4.8.1
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
Jinja2==2.11.3
jmespath==0.10.0
jsonlines==1.2.0
jsonschema==3.2.0
kombu==4.6.11
korean-lunar-calendar==0.2.1
linear-tsv==1.1.0
machina-swagger-client==3.94.0
Mako==1.1.3
Markdown==3.3.3
MarkupSafe==1.1.1
marshmallow==3.9.0
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.23.1
msgpack==1.0.0
multidict==5.0.0
mysqlclient==1.4.2.post1
natsort==7.0.1
numpy==1.19.4
openpyxl==3.0.5
packaging==21.0
pandas==1.2.2
parsedatetime==2.6
pgsanity==0.2.9
Pillow==7.2.0
pks-swagger-client==3.94.0
polyline==1.4.0
prison==0.1.3
progress==1.5
protobuf==3.19.0
psycopg2==2.8.5
psycopg2-binary==2.8.5
pyarrow==4.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyathena==2.3.0
pycparser==2.20
pycryptodome==3.11.0
pydruid==0.6.1
PyGithub==1.54.1
PyHive==0.6.3
PyJWT==1.7.1
PyMeeus==0.3.7
pyparsing==2.4.7
pyrsistent==0.16.1
python-dateutil==2.8.1
python-dotenv==0.15.0
python-editor==1.0.4
python-geohash==0.8.5
python3-openid==3.2.0
pytz==2020.4
PyYAML==5.4.1
rcc-swagger-client==3.94.0
redis==3.2.1
requests==2.24.0
requests-aws4auth==1.1.1
rfc3986==1.4.0
rsa==4.7.2
s3transfer==0.3.3
sasl==0.2.1
secrecy-swagger-client==3.94.0
selenium==3.141.0
simplejson==3.17.2
six==1.15.0
slackclient==2.5.0
SQLAlchemy==1.3.20
SQLAlchemy-Utils==0.36.8
sqlparse==0.3.0
tableschema==1.20.0
tabulate==0.8.9
tabulator==1.52.5
tenacity==8.0.1
thrift==0.13.0
thrift-sasl==0.4.2
typing-extensions==3.7.4.3
unicodecsv==0.14.1
uritemplate==4.1.1
urllib3==1.25.11
vine==1.3.0
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
WTForms==2.3.3
WTForms-JSON==0.3.3
xlrd==1.2.0
yarl==1.6.2
zipp==3.4.1
dpgaspar commented 2 years ago

Hi @harshgadhia,

What's your AWS OpenSearch (ElasticSearch) version? 6.8.0?

harshgadhia commented 2 years ago

@dpgaspar Yes that is correct:

{
    "name": "lagQbOg",
    "cluster_name": "332592635208:metrics-search-dev",
    "cluster_uuid": "2OAhfNl3S7aumuuBxQcxCQ",
    "version": {
        "number": "6.8.0",
        "build_flavor": "oss",
        "build_type": "zip",
        "build_hash": "8169a24",
        "build_date": "2021-04-21T19:26:55.782637Z",
        "build_snapshot": false,
        "lucene_version": "7.7.0",
        "minimum_wire_compatibility_version": "5.6.0",
        "minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}
harshgadhia commented 2 years ago

Here's the latest information I have:

The Opensearch (aka elasticsearch) has support to query SQL via the _opendistro/_sql endpoint since version >=6.5. However, only after version >=7.4, the endpoint presumable sends response in jdfc format (that is what must be the assumption I guess, when writing this.

However, for older versions <7.4, you have to explicitly pass on a query parameter ?format=jdbc. Looks like doing this will make the library compatible for the older versions.

Looking forward for your help in this matter, thanks in advance.

harshgadhia commented 2 years ago

@dpgaspar Did you get a chance to look at this? Really appreciate a response on the above. Thank you in advance!

dpgaspar commented 2 years ago

sounds reasonable to me @harshgadhia are you willing to make a PR? We can fetch current cluster version, or create a new connection string parameter

harshgadhia commented 2 years ago

@dpgaspar sure, I can try looking into it, and send a PR your way!

prijopauly commented 7 months ago

@harshgadhia, I am new to superset and I am facing same issue with AWS ES. I am not sure how we can pass query parameter ?format=jdbc in _opendistro/_sql in superset