googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
582 stars 319 forks source link

TextGenerationResponse with `is_blocked = True` shows as an empty string using `repr()` #2462

Closed admackin closed 9 months ago

admackin commented 10 months ago

Environment details

Steps to reproduce

  1. produce a TextGenerationResponse instance which triggers safety filters (or create an empty one)
  2. call repr(response) on it; the empty string will be shown

Code example

repr(TextGenerationResponse(text="", is_blocked=True)) # shows empty string

Why is this an issue?

this is a very surprising API design choice, and goes against python recommendations for use of repr. I ran into an issue using one of Google's own Vertex tutorials, and after a good 20 minutes of debugging, finally worked out it was a content blocking issue. if repr() was showing object state, as expected in Python, instead of the .text attribute, I could have seen it straight away. But without this, all you see is an unexplained empty string.

This issue is particularly important given how often the safety filters in VertexAI can be triggered unexpectedly. It is particularly likely in interactive console sessions (such as your own tutorials) where repr(response) or eg (f"{response!r}) is most likely to be used.

dataclasses provide an excellent __repr__ implementation, so I believe the method should never have been overridden (but it would be an acceptable __str__ implementation)

Fixing

a one-line replacement of __repr__ with __str__ here would resolve this (as I've done here). I haven't opened a PR because getting the CLA signed sounds like a massive headache for such a tiny PR.

Ark-kun commented 9 months ago

Thank you for the report. The fix is on its way, pending approval.

The main reason why we chose to override __repr__ was that Jupyter interactive notebooks that we use for demos use __repr__ for outputting the objects. The default output is pretty verbose and repeats the same data at least two times (due to the presence of _prediction_response). Additionally, the text representation and formatting is garbled since newline characters are not followed, but rather remain as "\n" escape codes in the output. Overriding __str__ does not have any effect on this behavior.

Ark-kun commented 9 months ago

Compare:

Ingredients:

* 3 very ripe bananas, mashed
* 1/2 cup (1 stick) unsalted butter, at room temperature
* 3/4 cup granulated sugar
* 3/4 cup packed light brown sugar
* 2 large eggs
* 2 teaspoons vanilla extract
* 1 1/2 cups all-purpose flour
* 1 teaspoon baking soda
* 1/2 teaspoon salt
* 1/2 cup chopped walnuts or pecans (optional)

Instructions:

1. Preheat oven to 350 degrees F (

with

TextGenerationResponse(text='Ingredients:\n\n* 3 very ripe bananas, mashed\n* 1/2 cup (1 stick) unsalted butter, at room temperature\n* 3/4 cup granulated sugar\n* 3/4 cup packed light brown sugar\n* 2 large eggs\n* 2 teaspoons vanilla extract\n* 1 1/2 cups all-purpose flour\n* 1 teaspoon baking soda\n* 1/2 teaspoon salt\n* 1/2 cup chopped walnuts or pecans (optional)\n\nInstructions:\n\n1. Preheat oven to 350 degrees F (', _prediction_response=Prediction(predictions=[{'safetyAttributes': {'scores': [], 'blocked': False, 'categories': []}, 'citationMetadata': {'citations': []}, 'content': 'Ingredients:\n\n* 3 very ripe bananas, mashed\n* 1/2 cup (1 stick) unsalted butter, at room temperature\n* 3/4 cup granulated sugar\n* 3/4 cup packed light brown sugar\n* 2 large eggs\n* 2 teaspoons vanilla extract\n* 1 1/2 cups all-purpose flour\n* 1 teaspoon baking soda\n* 1/2 teaspoon salt\n* 1/2 cup chopped walnuts or pecans (optional)\n\nInstructions:\n\n1. Preheat oven to 350 degrees F ('}], deployed_model_id='', model_version_id='', model_resource_name='', explanations=None), is_blocked=False, safety_attributes={})
Ark-kun commented 9 months ago

The issue has been fixed. Thank you for your patience. In the future we might stop overriding the __repr__ method.