Closed davert0 closed 1 year ago
I also suggest changing the logic for processing jobs from BigQuery. In the current implementation, if there is an error during the loading process, the following message can be seen in the logs:
google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
This message is not informative as it does not indicate the specific problem.
But if you do something like this:
res = client.load_table_from_file(
BytesIO(job.data),
job.table,
num_retries=3,
job_config=bigquery.LoadJobConfig(**job.config),
)
while not res.done():
pass
errors = res.errors
if errors:
raise ValueError(errors)
you will be able to see the specific error in the logs. In my case, for example, there was a schema error -
{'reason': 'invalid', 'message': 'Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0'}, {'reason': 'invalid', 'message': 'Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to double. Field: CAMPAIGN_NAME; Value: Tours Barcelona'}
We fixed Job class inheriting from named tuple in the storage write, I have propagated the same everywhere. Also we ensure errors log when thrown. See version 0.7.0
Exception handling never works because NamedTuple is immutable and trying to increase job attempts leads to AttributeError