spyder-ide / spyder

Official repository for Spyder - The Scientific Python Development Environment
https://www.spyder-ide.org
MIT License
8.35k stars 1.62k forks source link

Variable Explorer Problems with Google Big Query Dataframes #18020

Open RMSekulic opened 2 years ago

RMSekulic commented 2 years ago

Issue Report Checklist

Problem Description

The variable explorer appears to be unable to display dataframes that are read in from Google Big Query, using either the pandas_gbq package, or google-cloud-bigquery package.

This does not appear to be a problem for data frames read in from CSVs or the read_sql function of pandas, only anything brought in from Big Query.

There seems to be a problem with the BQ query object that creates the dataframe being non-pickleable? I'm not really sure. Is there a way for me to get around this problem? I like to be able to interactively explore dataframes as I work on them, so this is frustrating to the say the least.

Also apologies if I have filled out this report incorrectly. Its my first time making something like this.

What steps reproduce the problem?

  1. Read in data using df = pd.read_csv("/Users/name/Desktop/test.csv")
  2. Click the item in Variable Explorer to view. Works fine.
  3. Read in data using df = client.query("SELECT * FROM dataset.table").result().to_dataframe()
  4. Click the item in Variable explorer to view. Fails.

What is the expected output? What do you see instead?

I would expect to see the dataframe pop up. Instead I get this:

Spyder was unable to retrieve the value of this variable from the console.

The error message was:
An error occured, see the console..

But no output appears in the console.

Interestingly, I get a different result if I try to click the item generated by df = client.query("SELECT * FROM dataset.table").result() (removing the to_dataframe() above).

If I click that object I get:

Spyder was unable to retrieve the value of this variable from the console.

The error message was:
The variable is not picklable.

Console output pasted below.

Paste Traceback/Error Below (if applicable)


Exception in comms call get_value:

  File "/usr/local/lib/python3.9/site-packages/spyder_kernels/comms/commbase.py", line 347, in _handle_remote_call
    self._set_call_return_value(msg_dict, return_value)

  File "/usr/local/lib/python3.9/site-packages/spyder_kernels/comms/commbase.py", line 384, in _set_call_return_value
    self._send_message('remote_call_reply', content=content, data=data,

  File "/usr/local/lib/python3.9/site-packages/spyder_kernels/comms/frontendcomm.py", line 109, in _send_message
    return super(FrontendComm, self)._send_message(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/spyder_kernels/comms/commbase.py", line 247, in _send_message
    buffers = [cloudpickle.dumps(

  File "/usr/local/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)

  File "/usr/local/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)

  File "/usr/local/lib/python3.9/site-packages/google/cloud/client/__init__.py", line 194, in __getstate__
    raise PicklingError(

_pickle.PicklingError: Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.

Versions

Dependencies

PASTE DEPENDENCIES HERE
dalthviz commented 2 years ago

Hi @RMSekulic thank you for the feedback! Could it be possible for you to test using the max_results kwarg with the to_dataframe method? Maybe Spyder is not being able to retrieve the variable due its size.

About your second test, using the results without the to_dataframe(), as the error suggest the only variables that you can inspect with the variable explorer are pickable ones. I think there is not much we can do in that case, sorry :/

Let us know if something of the above helps!

RMSekulic commented 2 years ago

Hi @dalthviz thank you for your response!

I feel quite silly now, as I was troubleshooting this for over 24 hours before finally making a post here.... only to finally solve it right after posting.

There were several date related fields in the BQ tables, that for some reason python was reading in as dbdate type. Not really sure what that is. If there is any column of dbdate type in the df, Spyder can't retrieve it apparently.

If you drop those columns, or (better yet) convert them into datetime, the variable explorer seems to be able to show it.

I'm not sure if it would be possible to make the error say it has to do with the df contents, rather than the df itself? It might be too niche of a case though.

dalthviz commented 2 years ago

Thanks for the new info @RMSekulic ! Not totally sure if there is something we can do about this since no error appeared on the console when using the to_dataframe method, right? (maybe @ccordoba12 or @impact27 have more info about what can we do to improve the error message displayed).

However, checking a little bit seems like you can install a package that extends the data types from pandas to support BigQuery/SQL systems data types (which I guess adds support for the dbdate type you mention):

Checking the example above my guess is that if you install the package and import the db-dtypes package (import db_dtypes) no error should appear even without dropping/converting the columns šŸ¤” (not sure though šŸ˜…)

impact27 commented 2 years ago

If you can provide a minimal reproducing example I can try to understand why no error was shown.