googleapis / python-bigquery-dataframes

BigQuery DataFrames
https://cloud.google.com/python/docs/reference/bigframes/latest
Apache License 2.0
187 stars 36 forks source link

bigframes 1.5.0 example showing module 'numpy' has no attribute 'dtypes' #687

Closed wazi55 closed 1 week ago

wazi55 commented 3 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

import sys
import bigframes
import google.cloud.bigquery
import ibis
import pandas
import pyarrow
import sqlglot

print(f"Python: {sys.version}")
print(f"bigframes=={bigframes.__version__}")
print(f"google-cloud-bigquery=={google.cloud.bigquery.__version__}")
print(f"ibis=={ibis.__version__}")
print(f"pandas=={pandas.__version__}")
print(f"pyarrow=={pyarrow.__version__}")
print(f"sqlglot=={sqlglot.__version__}")
Python: 3.10.9 (main, Mar  1 2023, 12:33:47) [Clang 14.0.6 ]
bigframes==1.5.0
google-cloud-bigquery==3.22.0
ibis==8.0.0
pandas==1.5.3
pyarrow==12.0.1
sqlglot==20.11.0

Steps to reproduce

  1. Running https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/large_language_models.ipynb
  2. Gets module 'numpy' has no attribute 'dtypes' in cell 5

Code example

df = pd.DataFrame(
        {
            "prompt": ["What is BigQuery?", "What is BQML?", "What is BigQuery DataFrame?"],
        })
bf_df = bigframes.pandas.read_pandas(df)

Stack trace

# example

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

wazi55 commented 3 months ago
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], [line 5](vscode-notebook-cell:?execution_count=5&line=5)
      [1](vscode-notebook-cell:?execution_count=5&line=1) df = pd.DataFrame(
      [2](vscode-notebook-cell:?execution_count=5&line=2)         {
      [3](vscode-notebook-cell:?execution_count=5&line=3)             "prompt": ["What is BigQuery?", "What is BQML?", "What is BigQuery DataFrame?"],
      [4](vscode-notebook-cell:?execution_count=5&line=4)         })
----> [5](vscode-notebook-cell:?execution_count=5&line=5) bf_df = bigframes.pandas.read_pandas(df)

File /usr/local/anaconda3/lib/python3.10/site-packages/bigframes/pandas/__init__.py:604, in read_pandas(pandas_dataframe)
    [603](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/pandas/__init__.py:603) def read_pandas(pandas_dataframe: Union[pandas.DataFrame, pandas.Series, pandas.Index]):
--> [604](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/pandas/__init__.py:604)     return global_session.with_default_session(
    [605](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/pandas/__init__.py:605)         bigframes.session.Session.read_pandas,
    [606](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/pandas/__init__.py:606)         pandas_dataframe,
    [607](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/pandas/__init__.py:607)     )

File /usr/local/anaconda3/lib/python3.10/site-packages/bigframes/core/global_session.py:113, in with_default_session(func, *args, **kwargs)
    [112](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/core/global_session.py:112) def with_default_session(func: Callable[..., _T], *args, **kwargs) -> _T:
--> [113](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/core/global_session.py:113)     return func(get_global_session(), *args, **kwargs)

File /usr/local/anaconda3/lib/python3.10/site-packages/bigframes/session/__init__.py:974, in Session.read_pandas(self, pandas_dataframe)
    [970](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/session/__init__.py:970)     return self._read_pandas(
    [971](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/session/__init__.py:971)         pandas.DataFrame(index=pandas_dataframe), "read_pandas"
    [972](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/session/__init__.py:972)     ).index
    [973](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/bigframes/session/__init__.py:973) if isinstance(pandas_dataframe, pandas.DataFrame):
...
    [309](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/numpy/__init__.py:309)     return Tester
--> [311](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/numpy/__init__.py:311) raise AttributeError("module {!r} has no attribute "
    [312](https://file+.vscode-resource.vscode-cdn.net/usr/local/anaconda3/lib/python3.10/site-packages/numpy/__init__.py:312)                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'dtypes'
Genesis929 commented 3 months ago

Hello,

It seems that the issue might be related to specific environmental factors since setting up an identical package version environment did not reproduce the issue. To better diagnose and potentially resolve the problem, could you please provide additional information?

  1. Complete Traceback: The provided traceback appears to be truncated, missing some crucial details. A full traceback would be very helpful as it contains the complete sequence of calls that led to the error, including all files and line numbers involved.

  2. NumPy Version: Knowing the exact version of NumPy you are using could be useful, since this error is related to NumPy.

These details will greatly aid in pinpointing the root cause of the issue. Thank you for your cooperation!

shobsi commented 2 weeks ago

Hello,

It seems that the issue might be related to specific environmental factors since setting up an identical package version environment did not reproduce the issue. To better diagnose and potentially resolve the problem, could you please provide additional information?

  1. Complete Traceback: The provided traceback appears to be truncated, missing some crucial details. A full traceback would be very helpful as it contains the complete sequence of calls that led to the error, including all files and line numbers involved.
  2. NumPy Version: Knowing the exact version of NumPy you are using could be useful, since this error is related to NumPy.

These details will greatly aid in pinpointing the root cause of the issue. Thank you for your cooperation!

Hi @wazi55, are you able to provide these details. In addition, it would help if you could share which operating system you are running on? Thanks.

wazi55 commented 2 weeks ago

Hey that was resolved a while back! We can close this ticket.

shobsi commented 2 weeks ago

Hey that was resolved a while back! We can close this ticket.

That's great to know, @wazi55! We would appreciate if you could post the resolution steps here, to help in future re-occurrence of this issue. Thank you!

tswast commented 1 week ago

I'm able to reproduce this issue after updating my local test environment on Python 3.9. It doesn't appear to affect fresh environments. My requirements.txt:

$ pip freeze
aiohttp==3.8.5
aiosignal==1.3.1
anyio==3.7.1
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-lru==2.0.4
async-timeout==4.0.3
atpublic==3.1.1
attrs==22.2.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
bidict==0.22.1
-e git+ssh://git@github.com/googleapis/python-bigquery-dataframes.git@adfaddcc0fd9f495368e56eead4c1983d1cdf434#egg=bigframes&subdirectory=../../../bigframes-2
bleach==6.0.0
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.0.12
click==8.1.3
click-plugins==1.1.1
cligj==0.7.2
cloudpickle==2.0.0
comm==0.1.4
contourpy==1.2.1
coverage==7.2.2
cycler==0.12.1
db-dtypes==1.1.1
debugpy==1.6.7.post1
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
et-xmlfile==1.1.0
exceptiongroup==1.1.1
execnet==1.9.0
executing==1.2.0
fastjsonschema==2.18.0
filelock==3.10.7
Fiona==1.9.4.post1
fonttools==4.53.1
fqdn==1.5.1
frozenlist==1.4.0
fsspec==2023.3.0
gcsfs==2023.3.0
geopandas==0.12.2
google-api-core==2.19.1
google-auth==2.15.0
google-auth-oauthlib==1.0.0
google-cloud-bigquery==3.16.0
google-cloud-bigquery-connection==1.12.0
google-cloud-bigquery-storage==2.19.1
google-cloud-bigtable==2.24.0
google-cloud-core==2.3.2
google-cloud-functions==1.12.0
google-cloud-iam==2.12.1
google-cloud-pubsub==2.21.4
google-cloud-resource-manager==1.10.3
google-cloud-storage==2.0.0
google-cloud-testutils==1.3.3
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.59.0
greenlet==2.0.2
grpc-google-iam-v1==0.12.6
grpcio==1.53.0
grpcio-status==1.48.2
humanize==4.6.0
ibis-framework==8.0.0
idna==3.4
importlib-metadata==6.1.0
importlib_resources==6.4.4
iniconfig==2.0.0
ipykernel==6.25.1
ipython==8.14.0
ipython-genutils==0.2.0
ipywidgets==7.7.1
isoduration==20.11.0
jedi==0.19.0
jellyfish==0.8.9
Jinja2==3.1.2
joblib==1.3.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
jupyter-events==0.7.0
jupyter-lsp==2.2.0
jupyter_client==7.4.9
jupyter_core==5.3.1
jupyter_server==2.7.0
jupyter_server_terminals==0.4.4
jupyterlab==4.0.4
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.8
jupyterlab_server==2.24.0
kiwisolver==1.4.5
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib==3.7.1
matplotlib-inline==0.1.6
mdurl==0.1.2
mistune==3.0.1
mock==5.0.1
multidict==6.0.4
multipledispatch==0.6.0
nbclassic==1.1.0
nbclient==0.8.0
nbconvert==7.7.3
nbformat==5.9.2
nest-asyncio==1.5.7
notebook==6.5.7
notebook_shim==0.2.3
numpy==1.24.2
oauthlib==3.2.2
openpyxl==3.1.2
overrides==7.4.0
packaging==23.0
pandas==1.5.0
pandas-gbq==0.19.0
pandocfilters==1.5.0
parso==0.8.3
parsy==2.1
pexpect==4.8.0
pickleshare==0.7.5
pillow==10.4.0
platformdirs==3.2.0
pluggy==1.0.0
pooch==1.7.0
prometheus-client==0.17.1
prompt-toolkit==3.0.39
proto-plus==1.24.0
protobuf==3.20.3
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==8.0.0
pyarrow-hotfix==0.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydata-google-auth==1.8.2
Pygments==2.14.0
pyparsing==3.1.4
pyproj==3.6.0
pytest==7.2.2
pytest-cov==4.0.0
pytest-retry==1.1.0
pytest-timeout==2.1.0
pytest-xdist==3.2.1
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3
PyYAML==6.0
pyzmq==25.1.1
referencing==0.30.2
requests==2.27.1
requests-oauthlib==1.3.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.3.3
rpds-py==0.9.2
rsa==4.9
scikit-learn==1.2.2
scipy==1.11.1
Send2Trash==1.8.2
shapely==2.0.1
six==1.16.0
sniffio==1.3.0
soupsieve==2.4.1
SQLAlchemy==1.4.0
sqlglot==20.8.0
stack-data==0.6.2
tabulate==0.9.0
terminado==0.17.1
threadpoolctl==3.2.0
tinycss2==1.2.1
tomli==2.0.1
toolz==0.12.0
tornado==6.3.3
tqdm==4.65.0
traitlets==5.9.0
typing_extensions==4.5.0
uri-template==1.3.0
urllib3==1.26.15
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.1
widgetsnbextension==3.6.5
xarray==2023.7.0
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0

Failure:

$ py.test --quiet -n=20 --timeout=900 --durations=20 --junitxml=system_3.9_sponge_log.xml --cov=bigframes --cov=tests/system/small --cov-append --cov-c
onfig=.coveragerc --cov-report=term-missing --cov-fail-under=0 tests/system/small -x
bringing up nodes...
======================================================================= ERRORS ========================================================================
________________________________________________ ERROR collecting tests/system/small/test_dataframe.py ________________________________________________
tests/system/small/test_dataframe.py:2169: in <module>
    (bf_indexes.Index([1000, 2000, 3000])),
bigframes/core/indexes/base.py:89: in __new__
    block = df.DataFrame(pd_df, session=session)._block
bigframes/core/log_adapter.py:56: in wrapper
    return method(*args, **kwargs)
bigframes/dataframe.py:120: in __init__
    if dtype in {numpy.dtypes.ObjectDType, "object"}:
.nox/system-3-9/lib/python3.9/site-packages/numpy/__init__.py:320: in __getattr__
    raise AttributeError("module {!r} has no attribute "
E   AttributeError: module 'numpy' has no attribute 'dtypes'
...
=============================================================== short test summary info ===============================================================
ERROR tests/system/small/test_dataframe.py - AttributeError: module 'numpy' has no attribute 'dtypes'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
300 warnings, 1 error in 10.89s
tswast commented 1 week ago

Since we are using numpy directly, we need to add it here: https://github.com/googleapis/python-bigquery-dataframes/blob/main/setup.py and also make sure that our minimum acceptable version is pinned here: https://github.com/googleapis/python-bigquery-dataframes/blob/main/testing/constraints-3.9.txt

tswast commented 1 week ago

Per https://numpy.org/neps/nep-0029-deprecation_policy.html, we should support at least numpy 1.24.x+