googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
641 stars 347 forks source link

400 Error: "bigquery" output format does not support key_field in aiplatform_v1.BatchPredictionJob.InstanceConfig #4514

Open tetsu-i opened 1 month ago

tetsu-i commented 1 month ago

Summary

I encountered the following error when trying to specify the key_field in aiplatform_v1.BatchPredictionJob.InstanceConfig with a BigQuery input:

google.api_core.exceptions.InvalidArgument: 400 "bigquery" output format does not support key_field.

Environment details

Code example

from google.cloud import aiplatform, aiplatform_v1

LOCATION = "asia-northeast1"
MY_PROJECT = "my-project"

def batch_predict_with_bq(
    model: aiplatform.Model,
    job_display_name: str,
    bq_source_uri: str,
    bq_output_uri: str,
    machine_type: str,
) -> aiplatform_v1.BatchPredictionJob:
    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.InputConfig
    input_config = aiplatform_v1.BatchPredictionJob.InputConfig(
        instances_format="bigquery",
        bigquery_source=aiplatform_v1.BigQuerySource(
            input_uri=bq_source_uri,
        ),
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.InstanceConfig
    instance_config = aiplatform_v1.BatchPredictionJob.InstanceConfig(
        excluded_fields=["user_id"],
        key_field="key",
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.OutputConfig
    output_config = aiplatform_v1.BatchPredictionJob.OutputConfig(
        predictions_format="bigquery",
        bigquery_destination=aiplatform_v1.BigQueryDestination(
            output_uri=bq_output_uri,
        ),
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchDedicatedResources
    batch_dedicated_resources = aiplatform_v1.BatchDedicatedResources(
        machine_spec=aiplatform_v1.MachineSpec(machine_type=machine_type),
        starting_replica_count=1,
        max_replica_count=1,
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob

    job = aiplatform_v1.BatchPredictionJob(
        name="test",
        display_name=job_display_name,
        model=model.resource_name,
        input_config=input_config,
        output_config=output_config,
        instance_config=instance_config,
        dedicated_resources=batch_dedicated_resources,
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.job_service.JobServiceClient#google_cloud_aiplatform_v1_services_job_service_JobServiceClient_create_batch_prediction_job
    client = aiplatform_v1.JobServiceClient(
        client_options={"api_endpoint": f"{LOCATION}-aiplatform.googleapis.com"}
    )

    request = aiplatform_v1.CreateBatchPredictionJobRequest(
        parent=f"projects/{MY_PROJECT}/locations/{LOCATION}",
        batch_prediction_job=job,
    )

    response = client.create_batch_prediction_job(request=request)

    return response

Stack trace

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

...
    _ = batch_predict_with_bq(
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/main.py", line 305, in batch_predict_with_bq
    response = client.create_batch_prediction_job(request=request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/.venv/lib/python3.11/site-packages/google/cloud/aiplatform_v1/services/job_service/client.py", line 3739, in create_batch_prediction_job
    response = rpc(
               ^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/.venv/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/.venv/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 "bigquery" output format does not support key_field.

Expected Behavior

According to the documentation, it seems that specifying key_field with a bigquery input should be allowed, but the error indicates otherwise.

Actual Behavior

The job fails with a 400 error stating that the BigQuery output format does not support key_field, which contradicts the information in the documentation.

Additional Information

If key_field is not supported for the bigquery format, it would be helpful to update the documentation to reflect this limitation. Otherwise, any guidance on resolving this issue would be greatly appreciated.

Thanks!

jaycee-li commented 1 month ago

Hi @weichungw , could you please take a look at this or assign to the right person on your team?