googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
636 stars 345 forks source link

Batch prediction for Gemini: Failed to import data #4110

Closed FelixHoppe closed 1 month ago

FelixHoppe commented 3 months ago

When creating a batch prediction job, for Gemini 1.5, I get the following error:

Job failed: code: 3 message: "Failed to import data. Syntax error: Expected end of input but got identifier \"_vertex_row_id\" at [17:5]"

This occurs after strictly following the instructions in these documentations:

Below, you will find the steps to reproduce the issue.

Environment details

Steps to reproduce

  1. Insert data in "request" column in bigquery table
    
    import json
    from google.cloud import bigquery

client = bigquery.Client() table_id = "..." json_to_insert = { "contents": [ { "role": "user", "parts": { "text": "Give me a recipe for banana bread." } } ], "system_instruction": { "parts": [ { "text": "You are a chef." } ] } } rows_to_insert = [ {"request": json.dumps(json_to_insert)}, ] client.insert_rows_json(table_id, rows_to_insert)


2. create batch prediction job and retrieve output

import time import vertexai from vertexai.preview.batch_prediction import BatchPredictionJob

vertexai.init(project="...", location="...")

job = BatchPredictionJob.submit( "gemini-1.5-pro-001", "bq://...", output_uri_prefix = "bq://..." )

while not job.has_ended: time.sleep(5) job.refresh()

if job.has_succeeded: print("Job succeeded!") else: print(f"Job failed: {job.error}")



#### Output

Job failed: code: 3
message: "Failed to import data. Syntax error: Expected end of input but got identifier \"_vertex_row_id\" at [17:5]"

I appreciate your help!
FelixHoppe commented 2 months ago

@jaycee-li just saw that you were assigned to this issue. If you need additional information, please reach out!

jaycee-li commented 1 month ago

Hi @FelixHoppe , I ran your code on my side and the job succeeded. Could you please share the full request column of your Bigquery input?

from google.cloud import bigquery

client = bigquery.Client()
query = "SELECT request FROM your_dataset.your_table"

query_job = client.query(query)
results = query_job.result()
for result in results:
  print(result.request)
FelixHoppe commented 1 month ago

Hey @jaycee-li, just re ran the exact code from July 19 and now it works! Thanks for your support!

FelixHoppe commented 1 month ago

@jaycee-li unfortunately the code from July 19 now throws the following error: Job failed: code: 13 message: "INTERNAL"

The last time I executed it (Sep. 7th), it still worked. I also updated the google-cloud-aiplatform version to 1.66.0 and ran it again. Same error.

As you requested in your previous message, this is the export of the "request" column in my input table. {'request': {'contents': [{'parts': {'text': 'Give me a recipe for banana bread.'}, 'role': 'user'}], 'system_instruction': {'parts': [{'text': 'You are a chef.'}]}}}

Any help on this is highly appreciated!

jaycee-li commented 1 month ago

Hi @FelixHoppe , I've tested the same code on my end and it seems to be working as expected. It's possible that there were some temporary server issues when you initially ran the code. Could you please try running it again? If you still get the same error, please let me know and I'll be happy to investigate further.

FelixHoppe commented 1 month ago

@jaycee-li, it works again.

Thank you for your support!