Closed yanghuang1028 closed 4 months ago
Assuming based on the screenshot that this is a GCP environment, correct? Does this happen on every job or intermittent jobs?
Jobs can get stuck with the "RECEIVED" status when the instances within the managed instance group (MIG) are not running or have crashed. If service account onboarding has not been completed, the MIG could be in an unhealthy state. Can you confirm that the service account has been onboarded?
To check the MIG’s health in the cloud console:
To check the MIG's health using gcloud CLI:
gcloud compute instance-groups managed list-errors <MIG_NAME> --region=<REGION>
Hi @chasinandrew ,
Yes, it is a GCP environment and this issue happens on each job.
I checked the MIG's health, and there exists no error, only one warning.
BTW, we have 4 worker VM instances, and I found the 403 error
in three
of them. Some of them seem unstable
that keep restarting. Could this be the key point ?
Thank you for quickly replying and it really helps !!!
No problem! This could be happening because of the unstable VMs. To help us replicate this could you provide the following info:
Hi @chasinandrew ,
1.We used the latest repo(https://github.com/privacysandbox/aggregation-service) to deploy. So is the version v2.4.2 ?
Our google cloud link is https://console.cloud.google.com/home/dashboard?project=ecs-1709881683838. but I don't know if you have the permission to access it.
Thank you for helping to delve into the issue~
Thanks @yanghuang1028! This 403 error can happen when onboarding is incomplete. Can you please fill out this onboarding form to register your domain and service account?
@chasinandrew We filled out the form a few weeks ago, and your team sent a email to us.
Oh, I see. We used a different
service account to do this deployment. Could you help us to update the worker service account ?
our new worker service account is sa-worker-aggregation-service@ecs-1709881683838.iam.gserviceaccount.com
BTW, we just registered the domain in the production environment. If we do not register the domain of the staging environment, can the aggregation service correctly handle the reports from the staging environment(we can manually change chrome's settings to receive the reports from staging env now)? our staging reporting site is https://adservice-1.stratus.qa.ebay.com/
Thanks again!
Hi @yanghuang1028, I recommend to communicate this information through our support email alias. I'll be hiding your previous comment to avoid having that information in the public.
@chasinandrew please move support conversations around onboarding to email.
Re your question on prod vs staging: Your service account is connected to the site that is onboarded --> if the same service account (in the same GCP project) is used to process your reports you'll be able to process them in staging / prod. If a different account is used a separate onboarding request will be required.
Hi @hostirosti @chasinandrew
Thanks for protecting our private infomation!
The separate onboarding request is completed, and the job can be processed now. However, the job threw a _TRANSACTION_MANAGER_RETRIESEXCEEDED error when processing.
{
"job_status": "FINISHED",
"request_received_at": "2024-05-16T01:19:59.234435Z",
"request_updated_at": "2024-05-16T01:29:35.184066241Z",
"job_request_id": "test05",
"input_data_blob_prefix": "output/output_regular_reports_2024-04-24T02:38:04-07:00.avro",
"input_data_bucket_name": "tracking_tf_state_bucket",
"output_data_blob_prefix": "output/summary_report.avro",
"output_data_bucket_name": "tracking_tf_state_bucket",
"postback_url": "",
"result_info": {
"return_code": "PRIVACY_BUDGET_ERROR",
"return_message": "com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Exception while consuming privacy budget. Exception message: TRANSACTION_MANAGER_RETRIES_EXCEEDED \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.consumePrivacyBudgetUnits(ConcurrentAggregationProcessor.java:466) \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:329) \n com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)\nThe root cause is: com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngine$TransactionEngineException: TRANSACTION_MANAGER_RETRIES_EXCEEDED \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.proceedToNextPhase(TransactionEngineImpl.java:100) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.executeDistributedPhase(TransactionEngineImpl.java:196) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.executeCurrentPhase(TransactionEngineImpl.java:138)",
"error_summary": {
"error_counts": [],
"error_messages": []
},
"finished_at": "2024-05-16T01:29:35.113618072Z"
},
"job_parameters": {
"output_domain_blob_prefix": "domain/output_local_domain.avro",
"output_domain_bucket_name": "tracking_tf_state_bucket",
"attribution_report_to": "https://adservice-1.stratus.qa.ebay.com"
},
"request_processing_started_at": "2024-05-16T01:20:00.743721759Z"
}
The reports and domain.avro files are as followed: avro report output_domain.avro
BTW, where can I see the detail logs of each job processing on google cloud console ? I can't find it anywhere. Thanks a lot !
The job can be processed now, thanks a lot!
Hi team,
Our aggregation service is deployed successfully. But after creating a job, the job status is always RECEIVED. Do you have some clues about that ? our projectId is
ecs-1709881683838
Thanks a lot~~