tableau / server-client-python

A Python library for the Tableau Server REST API
https://tableau.github.io/server-client-python/
MIT License
655 stars 420 forks source link

When downloading the flow, a KeyError: 'content-disposition' occurs. #1338

Closed bugcity closed 6 months ago

bugcity commented 7 months ago

Describe the bug When attempting to download the flow, a KeyError: 'content-disposition' occurs.

Versions Details of your environment, including:

To Reproduce server.flows.download('<< flow-id >>')

Results File "<< project-path >>\flow_test\main.py", line 12, in main filepath = server.flows.download('50785764-47c6-45f7-8c84-22cba9c345d2') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< project-path >>.venv\Lib\site-packages\tableauserverclient\server\endpoint\endpoint.py", line 291, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< project-path >>.venv\Lib\site-packages\tableauserverclient\server\endpoint\flowsendpoint.py", line 124, in download , params = cgi.parse_header(server_response.headers["Content-Disposition"])


  File "<< project-path >>\.venv\Lib\site-packages\requests\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
           ~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'content-disposition'

The value of server_response.headers is as follows.
{'Date': 'Tue, 16 Jan 2024 03:48:14 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '14934', 'Connection': 'keep-alive', 'Server': 'Tableau', 'Vary': 'Access-Control-Request-Method,Access-Control-Request-Headers,Accept-Encoding', 'X-Tableau': 'Tableau Server', 'P3P': 'CP="NON", CP="NON"', 'X-UA-Compatible': 'IE=Edge', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Content-Security-Policy-Report-Only': "connect-src * https://*.tiles.mapbox.com https://api.mapbox.com; default-src blob:; font-src * data:; frame-src * data: tableau-desktop:; img-src * data: blob:; object-src data:; report-uri /vizql/csp-report; script-src * blob: wasm-unsafe-eval; style-src * 'unsafe-inline'", 'Last-Modified': 'Sun, 17 Dec 2023 23:44:58 GMT', 'ETag': '"0-gzip"', 'Content-Encoding': 'gzip'}
bcantoni commented 7 months ago

@jorwoods I think you did some work for the flows support - could you take a look at this report?

jorwoods commented 7 months ago

I tested v0.29 against tableau online and was able to get it to work successfully. I'll test against an on-prem install later this week.

import os
import tableauserverclient as TSC
from dotenv import load_dotenv
load_dotenv()
server = TSC.Server(os.getenv("TABLEAU_SERVER"), use_server_version=True)
auth = TSC.PersonalAccessTokenAuth(
    os.getenv("TABLEAU_TOKEN_NAME"),
    os.getenv("TABLEAU_PAT"),
    site_id=os.getenv("TABLEAU_SITE")
)
server.auth.sign_in(auth)
flows = server.flows.filter(name="Superstore-JW")
flow = flows[0]
server.flows.download(flow.id)
jorwoods commented 7 months ago

Headers from flows.download:

{
   "content-disposition":"name=\"tableau_flow\"; filename=\"Superstore-JW.tflx\"",
   "content-encoding":"gzip",
   "content-type":"application/octet-stream",
   "date":"Wed, 17 Jan 2024 03:28:40 GMT",
   "etag":"\"0-gzip\"",
   "last-modified":"Wed, 17 Jan 2024 03:18:48 GMT",
   "p3p":"CP=\"NON\", CP=\"NON\"",
   "referrer-policy":"strict-origin-when-cross-origin",
   "server":"Tableau",
   "strict-transport-security":"max-age=31536000; includeSubDomains",
   "vary":"Accept-Encoding",
   "x-content-type-options":"nosniff",
   "x-tableau":"Tableau Server",
   "x-ua-compatible":"IE=Edge",
   "x-xss-protection":"1; mode=block",
   "transfer-encoding":"chunked",
   "Connection":"keep-alive"
}
jorwoods commented 7 months ago

Tested against 2022.3.1 on-prem and was successful.

aidanharvey commented 7 months ago

I'm intermittently getting the same KeyError: 'content-disposition' error, when downloading Workbooks rather than Flows though. This only started happening after upgrading to Tableau Server 2023.1.8. I'm on the latest TSC 0.29 version.

Versions Details of your environment, including:

To Reproduce server.workbooks.download(<< Workbook ID >>, << Workbook Name >>)

Results << Python Path >>\python.exe << Project Path >>\update_workbook.py Connected to server << Server >> as user << User >>

Downloading Workbook: << Workbook 1 >> Publishing Workbook: << Workbook 1 >> ... Done

...

Downloading Workbook: << Workbook 4 >> Traceback (most recent call last): File "<< Project Path >>\update_workbook.py", line 482, in main(logger_wb, config_parser_wb, auto_mode) File "<< Project Path >>\update_workbook.py", line 173, in main server.workbooks.download(workbook.id, workbook.name) File "<< Python Package Path >>\tableauserverclient\server\endpoint\endpoint.py", line 291, in wrapper return func(self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< Python Package Path >>\tableauserverclient\server\endpoint\endpoint.py", line 333, in wrapper return func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< Python Package Path >>\tableauserverclient\server\endpoint\endpoint.py", line 333, in wrapper return func(self, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< Python Package Path >>\tableauserverclient\server\endpoint\workbooks_endpoint.py", line 186, in download return self.download_revision(workbook_id, None, filepath, include_extract, no_extract) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< Python Package Path >>\tableauserverclient\server\endpoint\endpoint.py", line 291, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<< Python Package Path >>\tableauserverclient\server\endpoint\workbooks_endpoint.py", line 486, in downloadrevision , params = cgi.parse_header(server_response.headers["Content-Disposition"])


  File "<< Python Package Path >>\requests\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
           ~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'content-disposition'

Process finished with exit code 1

**Notes**
This error is intermittent. In the script above, sometimes the error will occur after 4 workbooks are downloaded, sometimes after 8 workbooks are downloaded, or sometimes all specified workbooks (around 20) successfully download. In the result above, the error is occurring after the 4th workbook has been downloaded. The first three in this instance worked fine.
jorwoods commented 7 months ago

@aidanharvey Has me wondering if it is specific to some workbooks/flows, or overall an intermittent server response.

Does it seem to fail on the same workbooks typically, or is it all around pretty random?

aidanharvey commented 7 months ago

@jorwoods It's definitely very random. In my test runs of my script today, about half of the runs completed without issue. Of the other half where the errors occurred, I couldn't discern any pattern at all - no one workbook was failing any more than any others. Sometimes it fails right away on the 1st workbook in the list, sometimes it gets to the last one, sometimes one in the middle, etc.

So it seems like more of an intermittent server response than being related to specific content I think.

jorwoods commented 7 months ago

@bugcity Did you first encounter the error when downloading a single flow? Or did you encounter it in a loop? Was it a consistent problem, or intermittent?

jorwoods commented 7 months ago

@bcantoni It sounds to me like a server side glitch. Could do some sort of automatic retry if that header is missing. There is the possibility of bypassing content disposition entirely by fetching the item by id, then grabbing the item’s name property, and making the name file system friendly. But that seems like too many side effects. Plus if the content-disposition header is missing, we don’t know what else from the server response may be malformed, so a bypass doesn’t feel like the right idea.

bugcity commented 7 months ago

@jorwoods I have downloaded one flow.

    server = TSC.Server(url, use_server_version=True)
    server.auth.sign_in(tableau_auth)
    filepath = server.flows.download('ccbc5848-d511-42c2-aad1-2872e4a3c690')
    server.auth.sign_out()

A flow that downloads successfully will always succeed, but a flow that fails will always fail. I tried downloading several flows and only one was successful. The first event I reported occurred when downloading tflx; the tfl download was successful, but failed for other tfls. Since I am the server administrator and do not know what the flows are, I use the flows.file_type in the repository to determine whether a flow is tflx or tfl.

Successfully downloaded tfl

{'Date': 'Thu, 18 Jan 2024 01:43:27 GMT',
 'Content-Type': 'application/xml',
 'Content-Length': '22335',
 'Connection': 'keep-alive',
 'Server': 'Tableau',
 'Vary': 'Origin,Access-Control-Request-Method,Access-Control-Request-Headers,Accept-Encoding',
 'Content-Disposition': 'name="tableau_flow"; filename*=UTF-8\'\'"Test3.tfl"',
 'X-Tableau': 'Tableau Server',
 'P3P': 'CP="NON", CP="NON"',
 'X-UA-Compatible': 'IE=Edge',
 'X-Content-Type-Options': 'nosniff',
 'X-XSS-Protection': '1; mode=block',
 'Referrer-Policy': 'no-referrer-when-downgrade',
 'Content-Security-Policy-Report-Only': "connect-src * https://*.tiles.mapbox.com https://api.mapbox.com; default-src blob:; font-src * data:; frame-src * data: tableau-desktop:; img-src * data: blob:; object-src data:; report-uri /vizql/csp-report; script-src * blob: wasm-unsafe-eval; style-src * 'unsafe-inline'",
 'Last-Modified': 'Sun, 17 Dec 2023 23:44:56 GMT',
 'ETag': '"0-gzip"',
 'Content-Encoding': 'gzip'}

Download failed tfl

{'Date': 'Thu, 18 Jan 2024 01:53:07 GMT',
 'Content-Type': 'application/xml',
 'Transfer-Encoding': 'chunked',
 'Connection': 'keep-alive',
 'Server': 'Tableau',
 'Vary': 'Access-Control-Request-Method,Access-Control-Request-Headers,Accept-Encoding',
 'X-Tableau': 'Tableau Server',
 'P3P': 'CP="NON", CP="NON"',
 'X-UA-Compatible': 'IE=Edge',
 'X-Content-Type-Options': 'nosniff',
 'X-XSS-Protection': '1; mode=block',
 'Referrer-Policy': 'no-referrer-when-downgrade',
 'Content-Security-Policy-Report-Only': "connect-src * https://*.tiles.mapbox.com https://api.mapbox.com; default-src blob:; font-src * data:; frame-src * data: tableau-desktop:; img-src * data: blob:; object-src data:; report-uri /vizql/csp-report; script-src * blob: wasm-unsafe-eval; style-src * 'unsafe-inline'",
 'Last-Modified': 'Wed, 17 Jan 2024 03:18:37 GMT',
 'ETag': '"0-gzip"',
 'Content-Encoding': 'gzip'}
bcantoni commented 7 months ago

@jorwoods I tend to agree this seems like a server-side bug (as opposed to TSC). I've had it happen a grand total of one time, but I'm still working to reproduce.

aidanharvey commented 7 months ago

Hi @bcantoni, @jorwoods

Can I confirm what Tableau versions you're using when trying to recreate this?

@jorwoods - I see you've mentioned 2022.3. @bcantoni - what are you testing on?

This behavior only started occurring for me after I upgraded to Tableau 2023.1.8. Everything was fine on my previous version, which was only a different point release - Tableau 2023.1.5.

Also one more data point - if I downgrade to TSC 0.28 (from 0.29 where I'm seeing the issue), and then manually patch the header name issues introduced in Tableau 2023.1.7 (e.g. changing filename to filename* in the TSC code in workbooks_endpoint.py) detailed here, everything works fine - workbooks always download reliably without issue.

So it's only on the combination of the latest Tableau 2023.1.8 version and TSC 0.29 that I see the intermittent workbook download issue. Tableau 2023.1.8 with patched TSC 0.28 works fine.

Thanks Aidan

bugcity commented 6 months ago

I have received a response from Tableau that this occurs in 2023.1.2 and later when the flow name contains multibyte characters. I don't know when that will be fixed, so for the time being I will specify io_types_w in the filepath of the download and apply the following patch to work around it.

if "Content-Disposition" in server_response.headers:
    _, params = cgi.parse_header(server_response.headers["Content-Disposition"])