googleapis / python-bigquery-pandas

Google BigQuery connector for pandas
https://googleapis.dev/python/pandas-gbq/latest/index.html
BSD 3-Clause "New" or "Revised" License
447 stars 121 forks source link

pandas_gbq.to_gbq() return nulls everywhere #779

Closed Mintactus closed 3 months ago

Mintactus commented 4 months ago

Sorry to give limited informations about the bug but I have limited ressources and time:

Pandas 2.2.2

pandas_gbq.to_gbq() When i use it, its just returns a bunch of null values for every columns, the original data as values from the dataframe I'm sending the data from.

Update: The issue is caused by bad columns names, it will import the columns with bad names in bigquery, with surprisly the name on it, but turns all the values of this column in nulls. I don't get it

Linchin commented 4 months ago

Thank you @Mintactus for raising the issue. When you have time, could you provide a code snippet that reproduces the problem?

Mintactus commented 4 months ago

Any dataframe with a header havings spaces in the names will trigger it if i'm right

On Tue, Jun 4, 2024, 7:45 p.m. Lingqing Gan @.***> wrote:

Thank you @Mintactus https://github.com/Mintactus for raising the issue. When you have time, could you provide a code snippet that reproduces the problem?

— Reply to this email directly, view it on GitHub https://github.com/googleapis/python-bigquery-pandas/issues/779#issuecomment-2148587259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIBVDKGYIIV4JPTHJOXI5TZFZGQZAVCNFSM6AAAAABIZAKP72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGU4DOMRVHE . You are receiving this because you were mentioned.Message ID: @.***>

Linchin commented 4 months ago

I'm unable to reproduce the issue - even with column names with space in it, the dataframe can be uploaded with no problem. For example:

import pandas as pd
from pandas_gbq import gbq

df = pd.DataFrame({' A a': [1, 2, 3], ' B b': [4, 5, 6]})
print(df)

gbq.to_gbq(df, destination_table='table_id', project_id='project_id', if_exists='replace')

Could you tell me more specifically in what condition the issue happens?

tswast commented 3 months ago

@Linchin It might be that our test project has this preview feature enabled? https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#flexible-column-names

Linchin commented 3 months ago

@Linchin It might be that our test project has this preview feature enabled? https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#flexible-column-names

Wow thanks for the info! I'll disable preview and try again.

Linchin commented 3 months ago

@Mintactus Indeed this will be resolved by the feature flexible column names once it is GA. The tentative time of GA is end of 2024 Q3. But for now, I think it is working as intended, and I suggest using column names with the required format. I do think we should at least report an error, though.

Mintactus commented 3 months ago

Thanks for the feedback, really apprieciated

On Thu, Jun 20, 2024, 3:18 p.m. Lingqing Gan @.***> wrote:

@Mintactus https://github.com/Mintactus Indeed this will be resolved by the feature flexible column names https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#flexible-column-names once it is GA. The tentative time of GA is end of 2024 Q3. But for now, I think it is working as intended, and I suggest using column names with the required format https://cloud.google.com/bigquery/docs/schemas#column_names. I do think we should at least report an error, though.

— Reply to this email directly, view it on GitHub https://github.com/googleapis/python-bigquery-pandas/issues/779#issuecomment-2181368180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIBVDNVDI6RPAECB5H4OTLZIMTH7AVCNFSM6AAAAABIZAKP72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGM3DQMJYGA . You are receiving this because you were mentioned.Message ID: @.***>