preset-io / backend-sdk

Other
33 stars 22 forks source link

superset-cli does not work with superset 3.0.0 #244

Open FridrikLax opened 9 months ago

FridrikLax commented 9 months ago

Looks like import-assets and sync commands in superset-cli (0.2.8) do not work with superset version 3.0.0 Tried it with both basic authentication and jwt. Works with 2.1.0 but fails on 3.0.0.

Sample command: superset-cli --jwt-token {jwt_token} --loglevel debug {HOST} sync native ./

The error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/username/.pyenv/versions/local-dev/bin/superset-cli", line 8, in <module>
    sys.exit(superset_cli())
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/preset_cli/cli/superset/sync/native/command.py", line 239, in native
    import_resources(contents, client, overwrite)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/preset_cli/cli/superset/sync/native/command.py", line 366, in import_resources
    client.import_zip("assets", buf, overwrite=overwrite)
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/preset_cli/api/clients/superset.py", line 742, in import_zip
    payload = response.json()
  File "/Users/username/.pyenv/versions/local-dev/lib/python3.9/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

When using jwt-token, line 736 in superset.py returns a response object that contains HTML (authentication failed) so it fails in line 742 when trying to extract json

When using basic auth it fails even earlier

[15:56:00] DEBUG    [[15:56:00]] DEBUG: urllib3.connectionpool: Starting new HTTP connection (1):        connectionpool.py:228
                    localhost:8090
[15:56:01] DEBUG    [[15:56:01]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "GET /login/       connectionpool.py:456
                    HTTP/1.1" 200 51619
           DEBUG    [[15:56:01]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "POST /login/      connectionpool.py:456
                    HTTP/1.1" 302 201
           DEBUG    [[15:56:01]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "GET /login/       connectionpool.py:456
                    HTTP/1.1" 200 51620
           DEBUG    [[15:56:01]] DEBUG: preset_cli.api.clients.superset: GET                                   superset.py:433
                    http://localhost:8090/api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humani
                    zed,order_direction:desc,page:0,page_size:100)
           DEBUG    [[15:56:01]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "GET               connectionpool.py:456
                    /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direc
                    tion:desc,page:0,page_size:100) HTTP/1.1" 401 39
           DEBUG    [[15:56:01]] DEBUG: urllib3.connectionpool: Starting new HTTP connection (2):        connectionpool.py:228
                    localhost:8090
[15:56:03] DEBUG    [[15:56:03]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "GET /login/       connectionpool.py:456
                    HTTP/1.1" 200 51620
           DEBUG    [[15:56:03]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "POST /login/      connectionpool.py:456
                    HTTP/1.1" 302 201
           DEBUG    [[15:56:03]] DEBUG: urllib3.connectionpool: http://localhost:8090/ "GET /login/
FridrikLax commented 9 months ago

setting WTF_CSRF_ENABLED = False had no affect

FridrikLax commented 9 months ago

Sample command along with logs from 3.0.0 vs. 2.1.0 superset-cli -u admin -p admin --loglevel debug http://localhost:8089 import-assets ./ --overwrite 3.0.0

127.0.0.1 - - [26/Sep/2023:14:41:30 +0000] "GET /superset/welcome/ HTTP/1.1" 302 201 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET /login/ HTTP/1.1" 200 51492 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100) HTTP/1.1" 401 39 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET /login/ HTTP/1.1" 200 51490 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "POST /login/ HTTP/1.1" 302 189 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET / HTTP/1.1" 302 223 "http://localhost:8088" "Apache Superset Client (0.2.8)"
2023-09-26 14:41:31,722:WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET /superset/welcome/ HTTP/1.1" 302 201 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET /login/ HTTP/1.1" 200 51485 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:31 +0000] "GET /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100) HTTP/1.1" 401 39 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "GET /login/ HTTP/1.1" 200 51490 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "POST /login/ HTTP/1.1" 302 189 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "GET / HTTP/1.1" 302 223 "http://localhost:8088" "Apache Superset Client (0.2.8)"
2023-09-26 14:41:32,393:WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "GET /superset/welcome/ HTTP/1.1" 302 201 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "GET /login/ HTTP/1.1" 200 51490 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "GET /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100) HTTP/1.1" 401 39 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "GET /login/ HTTP/1.1" 200 51488 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:32 +0000] "POST /login/ HTTP/1.1" 302 189 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:33 +0000] "GET / HTTP/1.1" 302 223 "http://localhost:8088" "Apache Superset Client (0.2.8)"
2023-09-26 14:41:33,079:WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped
127.0.0.1 - - [26/Sep/2023:14:41:33 +0000] "GET /superset/welcome/ HTTP/1.1" 302 201 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:33 +0000] "GET /login/ HTTP/1.1" 200 51491 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:33 +0000] "GET /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100) HTTP/1.1" 401 39 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:33 +0000] "GET /login/ HTTP/1.1" 200 51491 "http://localhost:8088" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:33 +0000] "POST /login/ HTTP/1.1" 302 189 "http://localhost:8088" "Apache Superset Client (0.2.8)"

2.1.0:

127.0.0.1 - - [26/Sep/2023:14:41:39 +0000] "GET /superset/welcome/ HTTP/1.1" 200 27302 "-" "python-requests/2.31.0"
127.0.0.1 - - [26/Sep/2023:14:41:39 +0000] "GET /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100) HTTP/1.1" 200 729 "http://localhost:8089" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:39 +0000] "GET /api/v1/database/?q=(filters:!(),order_column:changed_on_delta_humanized,order_direction:desc,page:1,page_size:100) HTTP/1.1" 200 519 "http://localhost:8089" "Apache Superset Client (0.2.8)"
127.0.0.1 - - [26/Sep/2023:14:41:39 +0000] "GET /api/v1/database/export/?q=%21%285%29 HTTP/1.1" 200 759 "http://localhost:8089" "Apache Superset Client (0.2.8)"
Updating dbs Trino
2023-09-26 14:41:39,915:INFO:superset.models.helpers:Updating dbs Trino
127.0.0.1 - - [26/Sep/2023:14:41:39 +0000] "POST /api/v1/assets/import/ HTTP/1.1" 200 17 "http://localhost:8089" "Apache Superset Client (0.2.8)"
oliverlambson commented 9 months ago

Having exactly the same issue, I believe it's to do with redirects.

I've had some success getting it to at least auth with JWT by ensuring there's no redirect for the trailing / when hitting the csrf endpoint:

# auth/superset.py
...
class SupersetJWTAuth(TokenAuth):  # pylint: disable=abstract-method
...
-        response = self.session.get(
-            self.baseurl / "api/v1/security/csrf_token/",  # type: ignore
-            headers={"Authorization": f"Bearer {jwt}"},
-        )
+        url = str(self.baseurl / "api/v1/security/csrf_token/")
+        url = str(url).endswith("/") and str(url) or str(url) + "/"
+        response = self.session.get(
+            url,  # type: ignore
+            headers={"Authorization": f"Bearer {jwt}"},
+        )
...

This avoids the redirect due to str(yarl.URL) dropping the trailing slash (which is what happens under the hood if you follow the get request into the requests package—requests.models.PreparedRequest.prepare_url())


That gets me a bit farther, but it then bombs at the next hurdle due to an https request getting redirected to http by superset, I'm not sure if this then is a superset issue instead of superset-cli.

Logs before change:

[12:53:02] DEBUG    [[12:53:02]] DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): mysuperset.site:443            connectionpool.py:1014
           DEBUG    [[12:53:02]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET /api/v1/security/csrf_token       connectionpool.py:473
                    HTTP/1.1" 308 321                                                                                                                                    
           DEBUG    [[12:53:02]] DEBUG: urllib3.connectionpool: Starting new HTTP connection (1): mysuperset.site:80               connectionpool.py:245
           DEBUG    [[12:53:02]] DEBUG: urllib3.connectionpool: http://mysuperset.site:80 "GET /api/v1/security/csrf_token/        connectionpool.py:473
                    HTTP/1.1" 301 134                                                                                                                                    
           DEBUG    [[12:53:02]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET /api/v1/security/csrf_token/      connectionpool.py:473
                    HTTP/1.1" 401 39   

Logs after change:

[12:48:03] DEBUG    [[12:48:03]] DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): mysuperset.site:443            connectionpool.py:1014
[12:48:04] DEBUG    [[12:48:04]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET /api/v1/security/csrf_token/      connectionpool.py:473
                    HTTP/1.1" 200 105                                                                                                                                    
[12:48:07] DEBUG    [[12:48:07]] DEBUG: preset_cli.api.clients.superset: GET                                                                              superset.py:433
                    https://mysuperset.site/api/v1/database?q=(filters:!((col:database_name,opr:eq,value:default_sandbox)),order_column:                
                    changed_on_delta_humanized,order_direction:desc,page:0,page_size:100)                                                                                
           DEBUG    [[12:48:07]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET                                   connectionpool.py:473
                    /api/v1/database?q=(filters:!((col:database_name,opr:eq,value:default_sandbox)),order_column:changed_on_delta_humanized,order_d                      
                    irection:desc,page:0,page_size:100) HTTP/1.1" 308 591                                                                                                
           DEBUG    [[12:48:07]] DEBUG: urllib3.connectionpool: Starting new HTTP connection (1): mysuperset.site:80               connectionpool.py:245
           DEBUG    [[12:48:07]] DEBUG: urllib3.connectionpool: http://mysuperset.site:80 "GET                                     connectionpool.py:473
                    /api/v1/database/?q=(filters:!((col:database_name,opr:eq,value:default_sandbox)),order_column:changed_on_delta_humanized,order_                      
                    direction:desc,page:0,page_size:100) HTTP/1.1" 301 134                                                                                               
           DEBUG    [[12:48:07]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET                                   connectionpool.py:473
                    /api/v1/database/?q=(filters:!((col:database_name,opr:eq,value:default_sandbox)),order_column:changed_on_delta_humanized,order_                      
                    direction:desc,page:0,page_size:100) HTTP/1.1" 401 39                                                                                                
           ERROR    [[12:48:07]] ERROR: preset_cli.lib: {                                                                                                       lib.py:98
                        "msg": "Missing Authorization Header"                                                                                                            
                    } 

Ok, so it's more redirect stuff (see the 301). Keep changing stuff to not redirect:

# api/clients/superset.py
...
    def get_resources(self, resource_name: str, **kwargs: Any) -> List[Any]:
            ...
            url = self.baseurl / "api/v1" / resource_name / "" % {"q": query}
+            url = str(url)
+            url = url.replace("?", "/?").replace("//?", "/?")
            ...
...
    def create_resource(self, resource_name: str, **kwargs: Any) -> Any:
        """
        Create a resource.
        """
        url = self.baseurl / "api/v1" / resource_name / ""
+        url = str(url)
+        url = url.endswith("/") and url or url + "/"
        ...

Lots after this change (sync seems to work 🎉 ):

[13:49:30] DEBUG    [[13:49:30]] DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): mysuperset.site:443            connectionpool.py:1014
           DEBUG    [[13:49:30]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET /api/v1/security/csrf_token/      connectionpool.py:473
                    HTTP/1.1" 200 105                                                                                                                                    
https://mysuperset.site/api/v1/database/?q=(filters:!((col:database_name,opr:eq,value:some_value)),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100)
[13:49:33] DEBUG    [[13:49:33]] DEBUG: preset_cli.api.clients.superset: GET                                                                              superset.py:435
                    https://mysuperset.site/api/v1/database/?q=(filters:!((col:database_name,opr:eq,value:some_value)),order_column                
                    :changed_on_delta_humanized,order_direction:desc,page:0,page_size:100)                                                                               
           DEBUG    [[13:49:33]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET                                   connectionpool.py:473
                    /api/v1/database/?q=(filters:!((col:database_name,opr:eq,value:some_value)),order_column:changed_on_delta_humanized,order_                      
                    direction:desc,page:0,page_size:100) HTTP/1.1" 200 518                                                                                               
           INFO     [[13:49:33]] INFO: preset_cli.cli.superset.sync.dbt.databases: No database connection found, creating it                              databases.py:72
https://mysuperset.site/api/v1/database/
{'database_name': 'some_value', 'is_managed_externally': False, 'masked_encrypted_extra': None, 'sqlalchemy_uri': 'some_uri'}
           DEBUG    [[13:49:33]] DEBUG: preset_cli.api.clients.superset: POST https://mysuperset.site/api/v1/database/                   superset.py:459
                    {                                                                                                                                                    
                        "database_name": "some_value",                                                                                                              
                        "is_managed_externally": false,                                                                                                                  
                        "masked_encrypted_extra": null,                                                                                                                  
                        "sqlalchemy_uri":                                                                                                                                
                    "some_uri"                                                                                                                               
                    }                                                                                                                                                    
[13:49:36] DEBUG    [[13:49:36]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "POST /api/v1/database/ HTTP/1.1" 201  connectionpool.py:473
                    377                                                                                                                                                  
https://mysuperset.site/api/v1/dataset/?q=(filters:!((col:database,opr:rel_o_m,value:2),(col:schema,opr:eq,value:some_value2),(col:table_name,opr:eq,value:some_value3)),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_size:100)
           DEBUG    [[13:49:36]] DEBUG: preset_cli.api.clients.superset: GET                                                                              superset.py:435
                    https://mysuperset.site/api/v1/dataset/?q=(filters:!((col:database,opr:rel_o_m,value:2),(col:schema,opr:eq,value:some_value2                
                    ),(col:table_name,opr:eq,value:some_value3)),order_column:changed_on_delta_humanized,order_d                
                    irection:desc,page:0,page_size:100)                                                                                                                  
           DEBUG    [[13:49:36]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET                                   connectionpool.py:473
                    /api/v1/dataset/?q=(filters:!((col:database,opr:rel_o_m,value:2),(col:schema,opr:eq,value:some_value2),(col:table_nam                      
                    e,opr:eq,value:some_value3)),order_column:changed_on_delta_humanized,order_direction:desc,page:0,page_si                      
                    ze:100) HTTP/1.1" 200 413                                                                                                                            
           INFO     [[13:49:36]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.name.some_value3  datasets.py:125
https://mysuperset.site/api/v1/dataset/
{'database': 2, 'schema': 'some_value2', 'table_name': 'some_value3'}
           DEBUG    [[13:49:36]] DEBUG: preset_cli.api.clients.superset: POST https://mysuperset.site/api/v1/dataset/                    superset.py:459
                    {                                                                                                                                                    
                        "database": 2,                                                                                                                                   
                        "schema": "some_value2",                                                                                                               
                        "table_name": "some_value3"                                                                                               
                    }                                                                                                                                                    
[13:49:42] DEBUG    [[13:49:42]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "POST /api/v1/dataset/ HTTP/1.1" 201   connectionpool.py:473
                    3238                                                                                                                                                 
[13:49:51] DEBUG    [[13:49:51]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "PUT                                   connectionpool.py:473
                    /api/v1/dataset/1?override_columns=true HTTP/1.1" 200 317                                                                                            
           DEBUG    [[13:49:51]] DEBUG: preset_cli.api.clients.superset: GET https://mysuperset.site/api/v1/dataset/1                    superset.py:398
           DEBUG    [[13:49:51]] DEBUG: urllib3.connectionpool: https://mysuperset.site:443 "GET /api/v1/dataset/1 HTTP/1.1" 200   connectionpool.py:473
                    5674   

This is obviously the wrong solution, it should work with redirects, but at least this shows that that is where the issue is.

See here for linked superset issue: https://github.com/apache/superset/issues/25359

dannyshaw commented 2 days ago

I've had success patching issues to connect to my Superset instance.

There central fix was to modify the UsernamePasswordAuth to bring in the JWT token into authorisation. It logs in, gets csrf, fetches a JWT token, and returns the Authorization: Bearer header for requests.

Can PR this as is, but I'm guessing you would wanna implement it cleaner:

class UsernamePasswordAuth(Auth):  # pylint: disable=too-few-public-methods
    """
    Auth to Superset via username/password.
    """

    def __init__(self, baseurl: URL, username: str, password: Optional[str] = None):

        super().__init__()

        self.csrf_token: Optional[str] = None
        self.baseurl = baseurl
        self.username = username
        self.password = password
        self.token = None
        self.auth()

    def get_headers(self) -> Dict[str, str]:
        headers = {}

        if self.token:
            headers["Authorization"] = f"Bearer {self.token}"

        if self.csrf_token:
            headers["X-CSRFToken"] = self.csrf_token

        return headers

    def auth(self) -> None:
        self._login_and_store_csrf()
        self._fetch_and_store_token()

    def _login_and_store_csrf(self) -> None:
        """
        Login to get CSRF token and set cookies.
        """
        data = {"username": self.username, "password": self.password}

        response = self.session.get(self.baseurl / "login/")
        soup = BeautifulSoup(response.text, "html.parser")
        input_ = soup.find("input", {"id": "csrf_token"})
        csrf_token = input_["value"] if input_ else None
        if csrf_token:
            self.session.headers["X-CSRFToken"] = csrf_token
            data["csrf_token"] = csrf_token
            self.csrf_token = csrf_token

        # set cookies
        self.session.post(self.baseurl / "login/", data=data)

    def _fetch_and_store_token(self) -> None:
        """
        Fetch the JWT token to use for headers
        """
        data = {
            "username": self.username,
            "password": self.password,
            "provider":"db",
            "refresh":True,
        }

        api_login_url = self.baseurl / "api/v1/security/login"
        response = self.session.post(api_login_url, json=data)
        self.token = response.json()['access_token']

Note I did originally see similar issues with redirects to the trailing slash url and did make some tweaks to that but I'm not certain they're critical. Possibly. The missing Auth header was the key issue I discovered.