z3z1ma / target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.
MIT License
28 stars 38 forks source link

Bug: Increasing Job.attemt causes attribute error #68

Closed davert0 closed 1 year ago

davert0 commented 1 year ago
class Job(NamedTuple):
    table: str
    data: Union[bytes, memoryview]
    config: Dict[str, Any]
    attempt: int = 1
try:
    client.load_table_from_file(
        BytesIO(job.data),
        job.table,
        num_retries=3,
        job_config=bigquery.LoadJobConfig(**job.config),
    ).result()
except Exception as exc:
    job.attempt += 1
    if job.attempt > 3:
        # TODO: add a metric for this + a DLQ & wrap exception type
        self.error_notifier.send((exc, self.serialize_exception(exc)))
        raise 
    else:
        self.queue.put(job)

Exception handling never works because NamedTuple is immutable and trying to increase job attempts leads to AttributeError

davert0 commented 1 year ago

I also suggest changing the logic for processing jobs from BigQuery. In the current implementation, if there is an error during the loading process, the following message can be seen in the logs: google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.

This message is not informative as it does not indicate the specific problem.

But if you do something like this:

res = client.load_table_from_file(
    BytesIO(job.data),
    job.table,
    num_retries=3,
    job_config=bigquery.LoadJobConfig(**job.config),
)
while not res.done():
    pass
errors = res.errors
if errors:
    raise ValueError(errors)

you will be able to see the specific error in the logs. In my case, for example, there was a schema error - {'reason': 'invalid', 'message': 'Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0'}, {'reason': 'invalid', 'message': 'Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to double. Field: CAMPAIGN_NAME; Value: Tours Barcelona'}

z3z1ma commented 1 year ago

We fixed Job class inheriting from named tuple in the storage write, I have propagated the same everywhere. Also we ensure errors log when thrown. See version 0.7.0