move-coop / parsons

A python library of connectors for the progressive community.
Other
255 stars 125 forks source link

[Bug] BigQuery.upsert fails depending on table's column order #1023

Open Jason94 opened 3 months ago

Jason94 commented 3 months ago

GoogleBigQuery.upsert() fails if the columns are ordered differently in the database's table than in the Parsons table.

If the mismatched columns are the same data type, it will silently insert the data into the wrong columns. If the mismatched columns are different data types, then the upsert will fail with an error.

To Reproduce

This script will throw an error when it attempts the upsert because columns a and b have different types:

from parsons import Table
from parsons.google.google_bigquery import GoogleBigQuery

def main():
    bq = GoogleBigQuery()
    table_name = "test.upsert_test_data"

    data = Table([{"a": 1, "b": "abc"}, {"a": 3, "b": "def"}])

    bq.copy(data, table_name, if_exists="drop")

    data2 = Table(
        [
            {"b": "xyz", "a": 1},
            {"b": "zyx", "a": 5},
        ]
    )

    bq.upsert(data2, table_name, "a")

    print(bq.query(f"SELECT * FROM {table_name}"))

if __name__ == "__main__":
    main()

Your Environment