googleads / google-ads-python

Google Ads API Client Library for Python
Apache License 2.0
521 stars 480 forks source link

OfflineUserDataJobOperation causing list population to go very slowly #577

Closed marco-sky closed 2 years ago

marco-sky commented 2 years ago

Hello,

A few weeks ago our Google Ads API has been broken due to a change in the way data is expected to be sent. I found this article which explained the issue and I can confirm that it is what has broken it. We were sending over batches of user_identifiers which were 100,000 long and the preferred method now is to send them in batches of one per user. I understand this.

Now, the way this processing is done at the moment is like this:

user_data_with_email_address_operation = (
    gClient.get_type(
        "OfflineUserDataJobOperation", version="v9"
    )
)
for x in [x["value"] for x in nextthing]:

    user_data_with_email_address = (
        user_data_with_email_address_operation.create
    )

    user_identifier_with_hashed_email = (
        gClient.get_type(
            "UserIdentifier", version="v9"
        )
    )

    user_identifier_with_hashed_email.hashed_email = (
        x
    )

    user_data_with_email_address.user_identifiers.append(
        user_identifier_with_hashed_email
    )

    payload.append(user_data_with_email_address_operation)

The 'nextthing' variable at the start of the loop is a list of 100,000 hashed email addresses. 'payload' is the list of API-ready addresses to be sent over to the API later in the code.

The problem with the above code is that all 100k emails are all within one OfflineUserDataJobOperation, so it gets rejected. What I did instead is put this first part inside the loop:

user_data_with_email_address_operation = (
    gClient.get_type(
        "OfflineUserDataJobOperation", version="v9"
    )
)

This managed to break the payload into 100,000 OfflineUserDataJobOperation objects (which the API prefers) but the payload is generated at a snail's pace of fewer than 10 loops per second, whereas the original version was completed in less than one second. At this rate, a list of half a million people would take a day to complete, and I have hundreds of large lists so this can't do.

I had a look at the sample code here, but that code only shows one user being added.

I also found that this chap had a similar problem, but the ticket was unresolved.

Is there any sample code for handling large datasets for sending over to build audiences in Google Ads?

Thank you, Marco M

BenRKarl commented 2 years ago

@marco-sky does it help if you switch to the non-proto-plus interface described in our docs? Generating a request this large can be slow because of the performance implications of proto-plus, but using the native protobuf interface will bypass that logic, and it is much faster. Hopefully this helps here.

BenRKarl commented 2 years ago

Closing this for now assuming setting use_proto_plus to False helped, if not please re-open this.