Closed ymarcon closed 3 years ago
REST entry point is per-project datasource:
POST /datasource/xxx/_sql[?id=_id]
where the body is the SQL query and the id
parameter is the name of column of identifiers (default is _id
).
It would be useful to be able to run SQL queries on a datasource, as examples:
select * from CNSIM1 select * from CNSIM1 union all select * from CNSIM2 select avg(LAB_HDL) as hdl_avg, GENDER as gender from CNSIM1 where LAB_HDL is not null group by GENDER select avg(LAB_HDL) as hdl_avg, PM_BMI_CATEGORICAL as bmi from (select * from CNSIM1 union all select * from CNSIM2) where PM_BMI_CATEGORICAL is not null group by PM_BMI_CATEGORICAL
This will allow to interface with systems that do SQL queries, such as Redash.
We are currently trying to create a dashboard in Redash using the REST entry point described in this issue. However it is not possible in Redash to post a plain text body (only form-data or JSON), see https://redash.io/help/data-sources/querying/json-api and the screenshot below. In the post above you mentioned "This will allow to interface with systems that do SQL queries, such as Redash". How did you envision to use Redash in combination with the SQL REST endpoint?
The idea is more to develop a custom Redash Data Source. The JSON Data Source that you are proposing to use does not have the flexibility of the SQL.
FYI you may be interested in the discussion thread "Creating a new query runner (data source) in Redash": https://discuss.redash.io/t/creating-a-new-query-runner-data-source-in-redash/347
FYI you may be interested in the discussion thread "Creating a new query runner (data source) in Redash": https://discuss.redash.io/t/creating-a-new-query-runner-data-source-in-redash/347
FYI: I wrote a super simple POC for an Opal QueryRunner, see file below. You need to place this file into redash/query_runner. and add the QueryRunner to redash/settings/init.py. After rebuilding the server and worker dockers the Opal datasource is visible in Redash. Currently it is a POC, we're planning to add type_guessing and error handling (see TODOs). Maybe it is nice to publish the python-opal-client to PyPi so that it can easily be used in the QueryRunner.
import logging
import sys
import uuid
import datetime
from redash.query_runner import *
from redash.utils import json_dumps, json_loads
logger = logging.getLogger(__name__)
try:
import requests
enabled = True
except ImportError:
enabled = False
def parse_json(data):
rows = []
columns = []
# TODO: implement TYPES_MAP (see json_ds.py)
for column in data['columns']:
columns.append({"name": column, "friendly_name": column, "type": "TYPE_STRING"})
for row in data['rows']:
rows.append(dict(zip (data['columns'], row)))
return {"rows": rows, "columns": columns}
class Opal(BaseSQLQueryRunner):
noop_query = "SELECT 1"
@classmethod
def configuration_schema(cls):
return {
"type": "object",
"properties": {
"server": {"type": "string", "default": "https://opal-demo.obiba.org"},
"project": {"type": "string"},
"token": {"type": "string"},
},
"required": ["server","project","token"],
"secret": ["token"],
}
@classmethod
def enabled(cls):
return enabled
@classmethod
def name(cls):
return "Opal"
@classmethod
def type(cls):
return "opal"
def run_query(self, query, user):
# TODO: try, except, finally for error handling
server = self.configuration.get("server", "")
project = self.configuration.get("project", "")
token = self.configuration.get("token", "")
''''
For testing HTTP call in PS:
Invoke-WebRequest -Uri "ENDPOINT/ws/datasource/PROJECT_NAME/_sql" `
-Method "POST" `
-Headers @{
"method"="POST"
"X-Opal-Auth"="TOKEN"
} `
-ContentType "text/plain" `
-Body "SELECT * FROM TABLE"
'''
#TODO: check for trailing / at the end of server URL
url = "{0}/ws/datasource/{1}/_sql".format(server, project)
headers = {
'x-opal-auth': token,
'Content-Type' : 'text/plain'
}
response = requests.request("POST", url, headers=headers, data=query)
json_data = response.json()
error = None
'''
Excepted result/format:
{
'rows': [{'var_a': '-99', 'var_b': '123'}, {'var_a': '2', 'var_b': '12312'}],
'columns': [{'name': 'var_a', 'friendly_name': 'var_a', 'type': 'TYPE_STRING'}, {'name': 'var_b', 'friendly_name': 'var_b', 'type': 'TYPE_STRING'}]
}
'''
data = json_dumps(parse_json(json_data))
return data, error
# TODO: implement get_schema (see snowflake.py)
register(Opal)
Excellent! I completely agree with having a real pip library.
Type guessing can be done by getting the table dictionary at /datasource/{project}/table/{table}/variables
.
Excellent! I completely agree with having a real pip library.
Type guessing can be done by getting the table dictionary at
/datasource/{project}/table/{table}/variables
.
It is a hassle to extract each table name from a query with a lot of tables joined together and then extract the selected column names per table (especially with all columns selected with *, if column names are renamed or a function is applied). Including datatypes in columns (in response) would be a lot easier.
Btw, is it true that aliases for table names are not supported by the SQL API?
Right, without a proper SQL parser that is not possible to identify tables and variables. And anyway, the data dictionary is useless when applying a SQL function transformation. I noticed that the JSON output was not correct regarding data types, see #3661, it's fixed.
Regarding table alias, the table name detection is now more restrictive, see #3662.
It would be useful to be able to run SQL queries on a datasource, as examples:
This will allow to interface with systems that do SQL queries, such as Redash.