heroku / salesforce-bulk

Python interface to the Salesforce.com Bulk API
MIT License
206 stars 157 forks source link

Compression for responses #73

Closed Currerius closed 5 years ago

Currerius commented 5 years ago

I would like to request gzip compressed json results from the salesforce-bulk API like this: job = bulk.create_query_job(sfObject, contentType='JSON', contentEncoding='gzip')

The documentation mentions that passing Content-Encoding: gzip in the header of the request enables this.

I can see that contentEncoding is not among the parameters of the create_query_job() function, but in the default headers something close is initialized:

Looking at the source it seems that Accepted-Encoding is used instead:

    def headers(self, values={}, content_type='application/xml'):
        default = {
            "X-SFDC-Session": self.sessionId,
            "Content-Type": "{}; charset=UTF-8".format(content_type),
            'Accept-Encoding': "gzip",
        }
        default.update(values)
        return default

Am I misunderstanding this or should it be

    def headers(self, values={}, content_type='application/xml'):
        default = {
            "X-SFDC-Session": self.sessionId,
            "Content-Type": "{}; charset=UTF-8".format(content_type),
            'Content-Encoding': "gzip",
        }
        default.update(values)
        return default

With kind regards and thanks for all the effort, R

lambacck commented 5 years ago

Hi @Currerius the Content-Encoding header is a response header. The docs say to use the Accept-Encoding request header with a value of gzip as the library currently uses:

Responses are compressed if the client makes a request using the Accept-Encoding header, with a value of gzip.

On thing that the library transparently unzips the results before you get them unless you use the raw=True parameter to the get_query_batch_results method.