passren / PyDynamoDB

PyDynamoDB is a Python DB API 2.0 (PEP 249) client for Amazon DynamoDB. A SQLAlchemy dialect is offered as well. Superset official database driver.
Other
17 stars 3 forks source link

Query Error in Superset with PyDynamoDB #49

Open Himanshubari08 opened 3 months ago

Himanshubari08 commented 3 months ago

I've connected Superset to DynamoDB using the PyDynamoDB driver. However, when I try to query DynamoDB through Superset, I find that it lacks several functions such as "split" and "replace," which prevents me from querying the tables as needed.

For example, I have a table with an "id" column that combines the database and table names. I need to split this column to get the database and table names separately.

Eg query: "SELECT substring(id,1,position('.' in id)-1) id1 FROM "my-table";" I am getting an error: Amazon DynamoDB Error dynamodb error: Expected CaselessKeyword 'FROM', found '(' (at char 16), (line:1, col:17)

I am following the guidance provided in this document: https://github.com/passren/PyDynamoDB/wiki/5.-Superset

Can someone please help me with this?

passren commented 3 months ago

@Himanshubari08 Thank you for the comments. Yes, this driver is able to support limit functions so far. Please refer to here. But appreciate your idea to allow me think about how to add more functions support.

Himanshubari08 commented 3 months ago

Can you add support for the split or substring function ASAP as that's what we need for now? For eg: we have an id column with the value "abc.xyz" and we want to split "abc" as one column and "xyz" as another column with a dot as a delimiter.

We need something as there in Redshift https://docs.aws.amazon.com/redshift/latest/dg/SPLIT_PART.html

passren commented 3 months ago

Sure. I will give priority to your request. These functions are not supported by DynamoDB natively. Allow me to spend some time exploring whether it is possible to bypass the DynamoDB statement check.

passren commented 3 months ago

@Himanshubari08 The functions you asked for are supported in the new release 0.6.0. Doc is available here. Please upgrade the lib. Have fun!

mdeshmu commented 3 months ago

@passren Thank you so much for quick introduction of those functions. Really appreciate it.

As @Himanshubari08 stated above, We have dot (.) as delimiter in the value of the column. I am not able to figure out how using newly introduced function can we split one column into two columns using dot as delimiter? I tried below query but it says invalid syntax.

SELECT SUBSTR(id,1,INSTR(id,'.')-1) country FROM "lakefront-ingest-stg-config-table"

example values for id column: india.mumbai

This is probably because support for INSTR is missing. can you add that?

Also, would it be possible to add inbuilt support for split_part similar to this extension?

Himanshubari08 commented 3 months ago

Hi @passren, Do we have any update on this?

passren commented 3 months ago

@mdeshmu @Himanshubari08 I'm working on the solution for how the syntax analysis supports nested functions on columns or a better solution to handle this situation. Thanks for your patience.

passren commented 3 months ago

@Himanshubari08 0.6.1 released to support nested SQL statements in Superset. This should help with your issue. Please see here

Himanshubari08 commented 3 months ago

Hi @passren Can you please add support for "INSTR"? Also, would it be possible to add inbuilt support for split_part similar to (https://github.com/nalgeon/sqlean/blob/main/docs/text.md#split_partsource-sep-part)

passren commented 3 months ago

@Himanshubari08 yes, it does support now. You can use any SQLite native supported function in outer query, like this:

SELECT col1, SUBSTR(col1, INSTR(col1, '*')), col2, col3 FROM (           # Outer Query: execute on QueryDB
    SELECT ddb_col1 col1, NUMBER(ddb_col2) col2, DATETIME(ddb_col3) col3 # Inner Query: execute on DynamoDB
    FROM DDB_TABLE WHERE ddb_col1 = *
) WHERE col1 = *

For split_part similar function, it will come with next version release.

mdeshmu commented 3 months ago

@passren We tried and still throws the same error. only SUBSTR function works but when we use INSTR inside SUBSTR it doesn't work.

passren commented 3 months ago

@passren We tried and still throws the same error. only SUBSTR function works but when we use INSTR inside SUBSTR it doesn't work.

Can you share the version of superset you are using?

mdeshmu commented 3 months ago

3.1.2

passren commented 3 months ago

@mdeshmu I failed to install Superset 3.1.2 on my local. Can you help to check the final SQL query generated by Apache Superset when you input the SQL query in SQL Lab? You can follow these steps:

1. Execute the Query:

Enter your SQL query (Two levels with outer and inner query) in SQL Lab and run it.

2. View the Query History:

After the query execution, navigate to the "Query History" tab. You can find this tab on the right side of the SQL Lab interface.

3. Select the Query:

In the "Query History" tab, you will see a list of queries that you have executed. Find the query you are interested in and click on it.

4. View the Query Details:

Once you click on the query, a pane will open showing the details of the query. This includes the final SQL that was sent to the database. The "Query History" tab provides information about the query execution, including the raw SQL that was generated and sent to the database. This allows you to see any modifications or transformations that Superset may have applied to your original query.

Himanshubari08 commented 3 months ago

Hi @passren , When I am running SUBSTR and REPLACE, then queries are running. Substr

But, when I am executing INSTR, it throws an error in the superset. INSTR

It would be beneficial if you provide functionality like split_part for this.

Himanshubari08 commented 3 months ago

@passren While running the nested query, the SQL statement is not generated in the Query History Tab. NestedQuery

passren commented 3 months ago

Hi @passren , When I am running SUBSTR and REPLACE, then queries are running. .... It would be beneficial if you provide functionality like split_part for this.

Can you try 0.6.2 which I added sqlean.py support as default for Query DB? You are able to use function text_split in this version.

Himanshubari08 commented 3 months ago

Thank you for the functionality and support. However, I'm encountering an issue with the text_split function. When I use the following code, it throws an error:

cursor.execute("select text_split('one|two|three','|',2)"); pyparsing.exceptions.ParseException: Expected CaselessKeyword 'FROM', found '(' (at char 17), (line:1, col:18)

cursor.execute('select text_split('one|two|three','|',2)'); SyntaxError: invalid syntax. Perhaps you forgot a comma?

Could you please assist in resolving this issue? If possible, could we schedule a time to connect and resolve this issue together? I am flexible with timing, so please let me know if we can connect.

passren commented 3 months ago

@Himanshubari08 I thought you are using text_split function in Superset. I'm sorry to say that it's not supported if you use regular Cursor to fetch data from DDB directly. Please refer to Arch here. There are two diff cursors in this lib. Regular cursor is a wrapper to interact with DDB, and Superset cursor added a middleware between DDB and Superset (aka QueryDB) which is a sqlite3 database by default. The text_split can only execute on QueryDB. So far, there is no plan to support more functions (Supported Functions Now ) on regular cursor.

mdeshmu commented 3 months ago

It doesn't work in Superset too.

image

mdeshmu commented 3 months ago

Here is full stack trace

2024-07-04 14:44:06,290:DEBUG:root:Query 6524: Running query: SELECT text_split(id,'.',1) FROM "lakefront-ingest-stg-config-table" LIMIT 1001 SupersetErrorsException Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps return f(self, *args, *kwargs) File "/app/superset/views/base_api.py", line 127, in wraps raise ex File "/app/superset/views/base_api.py", line 121, in wraps duration, response = time_function(f, self, args, kwargs) File "/app/superset/utils/core.py", line 1463, in time_function response = func(*args, kwargs) File "/app/superset/views/base_api.py", line 93, in wraps return f(self, *args, *kwargs) File "/app/superset/utils/log.py", line 255, in wrapper value = f(args, kwargs) File "/app/superset/sqllab/api.py", line 409, in execute_sql_query command_result: CommandResult = command.run() File "/app/superset/commands/sql_lab/execute.py", line 121, in run raise ex File "/app/superset/commands/sql_lab/execute.py", line 103, in run status = self._run_sql_json_exec_from_scratch() File "/app/superset/commands/sql_lab/execute.py", line 163, in _run_sql_json_exec_from_scratch raise ex File "/app/superset/commands/sql_lab/execute.py", line 158, in _run_sql_json_exec_from_scratch return self._sql_json_executor.execute( File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute raise SupersetErrorsException( superset.exceptions.SupersetErrorsException: [SupersetError(message="dynamodb error: Expected CaselessKeyword 'FROM', found '(' (at char 17), (line:1, col:18)", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'Amazon DynamoDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})] 2024-07-04 14:44:06,354:WARNING:superset.views.base:SupersetErrorsException Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps return f(self, *args, *kwargs) File "/app/superset/views/base_api.py", line 127, in wraps raise ex File "/app/superset/views/base_api.py", line 121, in wraps duration, response = time_function(f, self, args, kwargs) File "/app/superset/utils/core.py", line 1463, in time_function response = func(*args, kwargs) File "/app/superset/views/base_api.py", line 93, in wraps return f(self, *args, *kwargs) File "/app/superset/utils/log.py", line 255, in wrapper value = f(args, kwargs) File "/app/superset/sqllab/api.py", line 409, in execute_sql_query command_result: CommandResult = command.run() File "/app/superset/commands/sql_lab/execute.py", line 121, in run raise ex File "/app/superset/commands/sql_lab/execute.py", line 103, in run status = self._run_sql_json_exec_from_scratch() File "/app/superset/commands/sql_lab/execute.py", line 163, in _run_sql_json_exec_from_scratch raise ex File "/app/superset/commands/sql_lab/execute.py", line 158, in _run_sql_json_exec_from_scratch return self._sql_json_executor.execute( File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute raise SupersetErrorsException( superset.exceptions.SupersetErrorsException: [SupersetError(message="dynamodb error: Expected CaselessKeyword 'FROM', found '(' (at char 17), (line:1, col:18)", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'Amazon DynamoDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]

mdeshmu commented 2 months ago

@passren waiting for your response. thanks.

passren commented 2 months ago

@mdeshmu please try this way in superset:

SELECT text_split(id,'.',1) FROM (
    SELECT id FROM "lakefront-ingest-stg-config-table"
)
Himanshubari08 commented 2 months ago

Hi @passren image please check this

passren commented 2 months ago

@Himanshubari08 can you click "See more" to expand the stack trace?