mneedham91 / PyPardot4

PyPardot is a wrapper for Version 4 of the Pardot API, written in Python.
MIT License
48 stars 38 forks source link

Handling of 'total_results' in query works improperly. #40

Open ebergerson opened 4 years ago

ebergerson commented 4 years ago

I have looked at the code for the query method in both prospects.py and tagobjects.py and it has the following structure to ensure that a consistent list representation of the records returned is returned from the query. I agree this is a very good idea.

However, it appears to be misinterpreting the meaning of the total_results key in the result and consequently does not operate as I think it was intended:

        # Ensure result['tagObject'] is a list, no matter what.
        result = response.get('result')
        if result['total_results'] == 0:
            result['tagObject'] = []
        elif result['total_results'] == 1:
            result['tagObject'] = [result['tagObject']]

        return result

This is treating the value associated with total_results as if it represented the length of the response. However, from my experiments, this only holds true for when the results are less than the maximum page size of 200. My documentation describes total_results as:

Contains the number of tagObjects selected by this query. If this value is higher than 200, then several query requests may be necessary to retrieve all of the matched tagObjects

My interpretation is that it is passing along how many total records are associated with the query, even though there may only be up to 200 of those records included in any actual query. This allows the calling code to continue querying the data at various offsets to collect the full results.

The code correctly identifies that there are three conditions of data being returned, but incorrectly believes that the indicator for those conditions is the total_results value. This is only true for queries that result in less than 201 records.

I have 201 tag objects currently in our Pardot instance. When I query them the first time, I get total_results of 201, telling me how many there are and than a list of 200 records associated with the tagObject key in a list. When I query again, but with an offset of 200 (since the first 200 have offsets 0-199), the total_results value is still 201 (because that is indeed how many records are associated with the query, independent of offset), and now there is only one record returned associated with tagObject and it is not in a list. If I do the same query again, but this time with a very high offset, way past the number of records I can get, I still get a total_results value of 201, but now there is not even a tagObject key in the results. So, I suggest, the fix for this code looks like this:

        # Ensure result['tagObject'] is a list, no matter what.
        result = response.get('result')
        if 'tagObject' not in result:
            result['tagObject'] = []
        elif len(result['tagObject']) == 1:
            result['tagObject'] = [result['tagObject']]

        return result

I have not tested this code, but I believe it will perform the correct actions to normalize the value associated with the tagObject to always be present and be associated with a list. Of course, this would have to be adapted to each of the classes (prospects, tags, ...).