tableau / server-client-python

A Python library for the Tableau Server REST API
https://tableau.github.io/server-client-python/
MIT License
661 stars 421 forks source link

update_hyper_data *Payload Too Large* #987

Closed amartincolville closed 1 month ago

amartincolville commented 2 years ago

Describe the bug We are currently using Tableau to enable our analysts to extract data and create workbooks and dashboard on datasources. Due to the limitations of the Tableau UI, as well as the willingness to integrate this within our architecture, we are using Tableau Server Client and the Hyper API to allow the dynamic creation and refresh of datasources given a set of parameters or arguments. The setup mainly uses the TSC.publish method to create an initial datasource with the required historical data and, once this is published, use the TSC.update_hyper_data method to update smaller amounts of data with a "sliding window" approach. Our current setup is working fine when working with small amounts of data, although if we intend to work with larger datasets, which is the most common use case, we have stumble upon a Payload Too Large error, given from the Tableau Server. After some research, we have seen that there is a setting (api.server.extract-updates.max-size) that introduces a 10MB limitation to the payload, which in our case is a Hyper file. We do know that the publish method creates 64MB chunks and only commits the data after all chunks are on the server, but it seems that the update_hyper_data does not do this and the restrictions are too high for us to ensure a correct refresh when trying to do this "sliding winow". Is there a way this can be surpassed? Are we missing any concepts or are we not grasping the whole functionality of this?

Versions Details of your environment, including:

To Reproduce

  1. Create a .hyper file from a given dataset
  2. Create a datasource from the .hyper file
  3. Publish data to the previous datasource
  4. Create a subset of data with the same structure and create a new .hyper file (should be quite large in size)
  5. Use the update_hyper_data method with the latter .hyper file as payload and a replace action

Results Job failed

NOTE: Be careful not to post user names, passwords, auth tokens or any other private or sensitive information.

jacalata commented 2 years ago

Very delayed reaction but @vogelsgesang do you know what's happening here? Has this been fixed?

vogelsgesang commented 2 years ago

We do know that the publish method creates 64MB chunks and only commits the data after all chunks are on the server, but it seems that the update_hyper_data does not do this

update_hyper_data uses the exact same upload mechanism like publish. As such, it also uses the same chunking.

there is a setting (api.server.extract-updates.max-size) that introduces a 10MB limitation to the payload

Did you try increasing this limit. After increasing it, your updates should go through

amartincolville commented 2 years ago

hey @jacalata the way to fix this is as @vogelsgesang advises, by chaning the api.server.extract-updates.max-size parameter, although this only applies if you have a Tableau Server on premises. In case you have Tableau Online, there is no possibility to change this.

felipe-costa-compado commented 1 year ago

I'm having the same problem but I'm using Tableau Online. Is there anything I can do ? @vogelsgesang

vogelsgesang commented 1 year ago

How large is the data you would like to upload, @felipe-costa-compado?

I think we will have to change the server-side Tableau Online configuration for this. Not sure if it is possible to increase this limit for a specific site or not. Maybe this will require a global change...

I will not be able to do that config change myself, I will have to involve a couple of colleagues. A business justification for this change would help us prioritize this change. To that end: Can you describe your use case? And also on whose behalf you are requesting a solution here (i.e., which Tableau customer/partner/reseller)?

(reopening, so we have this as an open issue on our list)

felipe-costa-compado commented 1 year ago

How large is the data you would like to upload, @felipe-costa-compado?

I think we will have to change the server-side Tableau Online configuration for this. Not sure if it is possible to increase this limit for a specific site or not. Maybe this will require a global change...

I will not be able to do that config change myself, I will have to involve a couple of colleagues. A business justification for this change would help us prioritize this change. To that end: Can you describe your use case? And also on whose behalf you are requesting a solution here (i.e., which Tableau customer/partner/reseller)?

(reopening, so we have this as an open issue on our list)

Hello @vogelsgesang,

Thank you for your reply, so we are trying to update a data source by doing a complete replace of this one, historically we have been doing it by using only TSC.publish() method, so the datasource was completely re-built every time. This has the disadvantage to remove any change that was done on Tableau side (calculated field, field alias, etc) so we wanted to give TSC.update_hyper_data() a try. We have 2 data source using hyper files, one is arround 2.5Go and the other one about 10 Go. Both are uploading correctly on tableau because we see the log: "File upload finished". But after there is the following error:

Payload Too LargeThe file attached exceeded the file size limit of \'{0}\'.

vogelsgesang commented 1 year ago

ok, that use case makes a lot of sense.

Going all the way to 10GB would push the limit pretty far. I am not currently sure why we even have this limit, so maybe we are able to actually increase it that far. I will have to wait for the reply of some other people from Tableau, though, because I am not completely sure what the ins and outs of this limit are.

In the meantime: Any customer name which I should associate with this request? Do you, e.g., already have a separate request with our customer support open on this topic?

vogelsgesang commented 1 year ago

The file attached exceeded the file size limit of '{0}'.

This seems like another bug. The {0} is intended to be a placeholder, and you actually see the currently configured limit there...

felipe-costa-compado commented 1 year ago

be a placeholder, and you actually see the currently configur

ok, that use case makes a lot of sense.

Going all the way to 10GB would push the limit pretty far. I am not currently sure why we even have this limit, so maybe we are able to actually increase it that far. I will have to wait for the reply of some other people from Tableau, though, because I am not completely sure what the ins and outs of this limit are.

In the meantime: Any customer name which I should associate with this request? Do you, e.g., already have a separate request with our customer support open on this topic?

I didn't open any ticket with the support. You can open in the name of Compado.

amartincolville commented 1 year ago

thanks for following up on this - the increase seems a simple change on an environment variable, hopefully this could be change for Tableau Online users.

DrMaphuse commented 1 year ago

I can confirm that this limit makes incremental updates unfeasible for larger datasets. I am trying to insert a delta of about 1GB every day, and uploading that in chunks takes multiple times longer than simply replacing the entire dataset of 15GB (2 hours vs 30 minutes).

However, this states that the payload limit on Tableau Cloud is fixed at 100MB to limit server strain - does this mean that we will not get a fix for this?

To my understanding, this doesn't seem optimal. Incremental updates should leverage the hyper engine to minimize strain on the server. If a query is causing too much strain, the server should allocate the resources accordingly. Making the user send hundreds of small chunks is the opposite of limiting server strain - it maximizes the strain because the hyper engine has to process hundreds of queries without the ability to optimize away redundant workloads.

gustavo-axe commented 1 year ago

be a placeholder, and you actually see the currently configur

ok, that use case makes a lot of sense. Going all the way to 10GB would push the limit pretty far. I am not currently sure why we even have this limit, so maybe we are able to actually increase it that far. I will have to wait for the reply of some other people from Tableau, though, because I am not completely sure what the ins and outs of this limit are. In the meantime: Any customer name which I should associate with this request? Do you, e.g., already have a separate request with our customer support open on this topic?

I didn't open any ticket with the support. You can open in the name of Compado.

You found any solution? Cause I using python to pick my hyper data sources and uploading to Tableau Cloud, but there is one data source that run once per day, that have 1GB. It is hitting the 8h windows task scheduler time out. We changed to this method because of MFA, is the only task giving problem, others datasources are okay, one or two that have increase the update in 15, 30 min because of the time to python to publish.

joshuadienye commented 10 months ago

hey @jacalata the way to fix this is as @vogelsgesang advises, by chaning the api.server.extract-updates.max-size parameter, although this only applies if you have a Tableau Server on premises. In case you have Tableau Online, there is no possibility to change this.

Hey @amartincolville do you know how to change this paramter? I've search the Tableau docs and Google but I can't seem to find a way to change it anywhere. I'm using Tableau Server on Prem. Thank you!

jacalata commented 1 month ago

This limit was increased on both Tableau Cloud and Server, to a new default value of 100MB. More detail is in the API docs at https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm

@joshuadienye - use tsm to set that variable, same as the ones listed on this page https://help.tableau.com/current/server/en-us/cli_configuration-set_tsm.htm