tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
706 stars 288 forks source link

Bug: BigQueryClient.read_session inexplicably doesn't work in certain environments. #1407

Closed ivanmkc closed 3 years ago

ivanmkc commented 3 years ago

This code was taken from this notebook linked from https://www.tensorflow.org/io/tutorials/bigquery.

Running the below code on Colab works fine. Running the exact same code with a similar environment on my Macbook Pro, does not.

Using this code in AI Platform (Unified) for custom training also works fine.

Code to reproduce. Make sure you change PROJECT_ID

from google.cloud import bigquery
from tensorflow_io.bigquery import BigQueryClient
from tensorflow.python.framework import dtypes

PROJECT_ID = "python-docs-samples-tests"
DATASET_ID = 'census_dataset'
TRAINING_TABLE_ID = 'census_training_table'
EVAL_TABLE_ID = 'census_eval_table'

CSV_SCHEMA = [
      bigquery.SchemaField("age", "FLOAT64"),
      bigquery.SchemaField("workclass", "STRING"),
      bigquery.SchemaField("fnlwgt", "FLOAT64"),
      bigquery.SchemaField("education", "STRING"),
      bigquery.SchemaField("education_num", "FLOAT64"),
      bigquery.SchemaField("marital_status", "STRING"),
      bigquery.SchemaField("occupation", "STRING"),
      bigquery.SchemaField("relationship", "STRING"),
      bigquery.SchemaField("race", "STRING"),
      bigquery.SchemaField("gender", "STRING"),
      bigquery.SchemaField("capital_gain", "FLOAT64"),
      bigquery.SchemaField("capital_loss", "FLOAT64"),
      bigquery.SchemaField("hours_per_week", "FLOAT64"),
      bigquery.SchemaField("native_country", "STRING"),
      bigquery.SchemaField("income_bracket", "STRING"),
  ]

UNUSED_COLUMNS = ["fnlwgt", "education_num"]

tensorflow_io_bigquery_client = BigQueryClient()
read_session = tensorflow_io_bigquery_client.read_session(
      "projects/" + PROJECT_ID,
      PROJECT_ID, "census_training_table", DATASET_ID,
      list(field.name for field in CSV_SCHEMA 
           if not field.name in UNUSED_COLUMNS),
      list(dtypes.double if field.field_type == 'FLOAT64' 
           else dtypes.string for field in CSV_SCHEMA
           if not field.name in UNUSED_COLUMNS),
      requested_streams=2)

Error:

---------------------------------------------------------------------------
UnavailableError                          Traceback (most recent call last)
<ipython-input-13-060ae4e3f62f> in <module>
      1 BATCH_SIZE = 32
      2 
----> 3 training_ds = read_bigquery(TRAINING_TABLE_ID).shuffle(10000).batch(BATCH_SIZE)
      4 eval_ds = read_bigquery(EVAL_TABLE_ID).batch(BATCH_SIZE)

<ipython-input-12-87f7f9b3d704> in read_bigquery(table_name)
     20 def read_bigquery(table_name):
     21   tensorflow_io_bigquery_client = BigQueryClient()
---> 22   read_session = tensorflow_io_bigquery_client.read_session(
     23       "projects/" + PROJECT_ID,
     24       PROJECT_ID, table_name, DATASET_ID,

~/Documents/code/ai-platform-samples/env2/lib/python3.8/site-packages/tensorflow_io/bigquery/python/ops/bigquery_api.py in read_session(self, parent, project_id, table_id, dataset_id, selected_fields, output_types, row_restriction, requested_streams, data_format)
    169             raise ValueError("`selected_fields` must be a list or dict.")
    170 
--> 171         (streams, schema) = core_ops.io_big_query_read_session(
    172             client=self._client_resource,
    173             parent=parent,

<string> in io_big_query_read_session(client, parent, project_id, table_id, dataset_id, selected_fields, output_types, requested_streams, data_format, row_restriction, container, shared_name, name)

~/Documents/code/ai-platform-samples/env2/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6860   message = e.message + (" name: " + name if name is not None else "")
   6861   # pylint: disable=protected-access
-> 6862   six.raise_from(core._status_to_exception(e.code, message), None)
   6863   # pylint: enable=protected-access
   6864 

~/Documents/code/ai-platform-samples/env2/lib/python3.8/site-packages/six.py in raise_from(value, from_value)

UnavailableError: Error reading from Cloud BigQuery: Empty update [Op:IO>BigQueryReadSession]

Colab pip freeze

absl-py==0.12.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.1.0
appdirs==1.4.4
argon2-cffi==20.1.0
astor==0.8.1
astropy==4.2.1
astunparse==1.6.3
async-generator==1.10
atari-py==0.2.6
atomicwrites==1.4.0
attrs==20.3.0
audioread==2.1.9
autograd==1.3
Babel==2.9.0
backcall==0.2.0
beautifulsoup4==4.6.3
bleach==3.3.0
blis==0.4.1
bokeh==2.3.1
Bottleneck==1.3.2
branca==0.4.2
bs4==0.0.1
CacheControl==0.12.6
cachetools==4.2.1
catalogue==1.0.0
certifi==2020.12.5
cffi==1.14.5
chainer==7.4.0
chardet==3.0.4
click==7.1.2
cloudpickle==1.3.0
cmake==3.12.0
cmdstanpy==0.9.5
colorcet==2.0.6
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.3.2
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cvxopt==1.2.6
cvxpy==1.0.31
cycler==0.10.0
cymem==2.0.5
Cython==0.29.22
daft==0.0.4
dask==2.12.0
datascience==0.10.6
debugpy==1.0.0
decorator==4.4.2
defusedxml==0.7.1
descartes==1.1.0
dill==0.3.3
distributed==1.25.3
dlib==19.18.0
dm-tree==0.1.6
docopt==0.6.2
docutils==0.17
dopamine-rl==1.0.5
earthengine-api==0.1.260
easydict==1.9
ecos==2.0.7.post1
editdistance==0.5.3
en-core-web-sm==2.2.5
entrypoints==0.3
ephem==3.7.7.1
et-xmlfile==1.0.1
fa2==0.3.5
fancyimpute==0.4.3
fastai==1.0.61
fastavro==1.4.0
fastdtw==0.3.4
fastprogress==1.0.0
fastrlock==0.6
fbprophet==0.7.1
feather-format==0.4.1
filelock==3.0.12
firebase-admin==4.4.0
fix-yahoo-finance==0.0.22
Flask==1.1.2
flatbuffers==1.12
folium==0.8.3
future==0.16.0
gast==0.3.3
GDAL==2.2.2
gdown==3.6.4
gensim==3.6.0
geographiclib==1.50
geopy==1.17.0
gin-config==0.4.0
glob2==0.7
google==2.0.3
google-api-core==1.26.3
google-api-python-client==1.12.8
google-auth==1.28.1
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.4
google-cloud-bigquery==1.21.0
google-cloud-bigquery-storage==1.1.0
google-cloud-core==1.0.3
google-cloud-datastore==1.8.0
google-cloud-firestore==1.7.0
google-cloud-language==1.2.0
google-cloud-storage==1.18.1
google-cloud-translate==1.5.0
google-colab==1.0.0
google-pasta==0.2.0
google-resumable-media==0.4.1
googleapis-common-protos==1.53.0
googledrivedownloader==0.4
graphviz==0.10.1
greenlet==1.0.0
grpcio==1.32.0
gspread==3.0.1
gspread-dataframe==3.0.8
gym==0.17.3
h5py==2.10.0
HeapDict==1.0.1
hijri-converter==2.1.1
holidays==0.10.5.2
holoviews==1.14.3
html5lib==1.0.1
httpimport==0.5.18
httplib2==0.17.4
httplib2shim==0.0.3
humanize==0.5.1
hyperopt==0.1.2
ideep4py==2.0.0.post3
idna==2.10
imageio==2.4.1
imagesize==1.2.0
imbalanced-learn==0.4.3
imblearn==0.0
imgaug==0.2.9
importlib-metadata==3.10.1
importlib-resources==5.1.2
imutils==0.5.4
inflect==2.1.0
iniconfig==1.1.1
intel-openmp==2021.2.0
intervaltree==2.1.0
ipykernel==4.10.1
ipython==5.5.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.6.3
itsdangerous==1.1.0
jax==0.2.12
jaxlib==0.1.65+cuda110
jdcal==1.4.1
jedi==0.18.0
jieba==0.42.1
Jinja2==2.11.3
joblib==1.0.1
jpeg4py==0.1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.3.5
jupyter-console==5.2.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kaggle==1.5.12
kapre==0.1.3.1
Keras==2.4.3
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
keras-vis==0.4.1
kiwisolver==1.3.1
knnimpute==0.1.0
korean-lunar-calendar==0.2.1
librosa==0.8.0
lightgbm==2.2.3
llvmlite==0.34.0
lmdb==0.99
LunarCalendar==0.0.9
lxml==4.2.6
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.2.2
matplotlib-inline==0.1.2
matplotlib-venn==0.11.6
missingno==0.4.2
mistune==0.8.4
mizani==0.6.0
mkl==2019.0
mlxtend==0.14.0
more-itertools==8.7.0
moviepy==0.2.3.5
mpmath==1.2.1
msgpack==1.0.2
multiprocess==0.70.11.1
multitasking==0.0.9
murmurhash==1.0.5
music21==5.5.0
natsort==5.5.0
nbclient==0.5.3
nbconvert==5.6.1
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
nibabel==3.0.2
nltk==3.2.5
notebook==5.3.1
np-utils==0.5.12.1
numba==0.51.2
numexpr==2.7.3
numpy==1.19.5
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
okgrade==0.4.3
opencv-contrib-python==4.1.2.30
opencv-python==4.1.2.30
openpyxl==2.5.9
opt-einsum==3.3.0
osqp==0.6.2.post0
packaging==20.9
palettable==3.3.0
pandas==1.1.5
pandas-datareader==0.9.0
pandas-gbq==0.13.3
pandas-profiling==1.4.1
pandocfilters==1.4.3
panel==0.11.2
param==1.10.1
parso==0.8.2
pathlib==1.0.1
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
pip-tools==4.5.1
plac==1.1.3
plotly==4.4.1
plotnine==0.6.0
pluggy==0.7.1
pooch==1.3.0
portpicker==1.3.1
prefetch-generator==1.0.1
preshed==3.0.5
prettytable==2.1.0
progressbar2==3.38.0
prometheus-client==0.10.1
promise==2.3
prompt-toolkit==1.0.18
protobuf==3.12.4
psutil==5.4.8
psycopg2==2.7.6.1
ptyprocess==0.7.0
py==1.10.0
pyarrow==3.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.2
pycparser==2.20
pyct==0.4.8
pydata-google-auth==1.1.0
pydot==1.3.0
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
pyemd==0.5.1
pyerfa==1.7.2
pyglet==1.5.0
Pygments==2.6.1
pygobject==3.26.1
pymc3==3.7
PyMeeus==0.5.11
pymongo==3.11.3
pymystem3==0.2.0
PyOpenGL==3.1.5
pyparsing==2.4.7
pyrsistent==0.17.3
pysndfile==1.3.8
PySocks==1.7.1
pystan==2.19.1.1
pytest==3.6.4
python-apt==0.0.0
python-chess==0.23.11
python-dateutil==2.8.1
python-louvain==0.15
python-slugify==4.0.1
python-utils==2.5.6
pytz==2018.9
pyviz-comms==2.0.1
PyWavelets==1.1.1
PyYAML==3.13
pyzmq==22.0.3
qdldl==0.1.5.post0
qtconsole==5.1.0
QtPy==1.9.0
regex==2019.12.20
requests==2.23.0
requests-oauthlib==1.3.0
resampy==0.2.2
retrying==1.3.3
rpy2==3.4.3
rsa==4.7.2
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
screen-resolution-extra==0.0.0
scs==2.1.3
seaborn==0.11.1
Send2Trash==1.5.0
setuptools-git==1.2
Shapely==1.7.1
simplegeneric==0.8.1
six==1.15.0
sklearn==0.0
sklearn-pandas==1.8.0
smart-open==5.0.0
snowballstemmer==2.1.0
sortedcontainers==2.3.0
SoundFile==0.10.3.post1
spacy==2.2.4
Sphinx==1.8.5
sphinxcontrib-serializinghtml==1.1.4
sphinxcontrib-websupport==1.2.4
SQLAlchemy==1.4.7
sqlparse==0.4.1
srsly==1.0.5
statsmodels==0.10.2
sympy==1.7.1
tables==3.4.4
tabulate==0.8.9
tblib==1.7.0
tensorboard==2.5.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.4.1
tensorflow-datasets==4.0.1
tensorflow-estimator==2.4.0
tensorflow-gcs-config==2.4.0
tensorflow-hub==0.12.0
tensorflow-io==0.17.1
tensorflow-metadata==0.29.0
tensorflow-probability==0.12.1
termcolor==1.1.0
terminado==0.9.4
testpath==0.4.4
text-unidecode==1.3
textblob==0.15.3
textgenrnn==1.4.1
Theano==1.0.5
thinc==7.4.0
tifffile==2021.4.8
toml==0.10.2
toolz==0.11.1
torch==1.8.1+cu101
torchsummary==1.5.1
torchtext==0.9.1
torchvision==0.9.1+cu101
tornado==5.1.1
tqdm==4.41.1
traitlets==5.0.5
tweepy==3.10.0
typeguard==2.7.1
typing-extensions==3.7.4.3
tzlocal==1.5.1
uritemplate==3.0.1
urllib3==1.24.3
vega-datasets==0.9.0
wasabi==0.8.2
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wordcloud==1.5.0
wrapt==1.12.1
xarray==0.15.1
xgboost==0.90
xkit==0.0.0
xlrd==1.1.0
xlwt==1.3.0
yellowbrick==0.9.1
zict==2.0.0
zipp==3.4.1

My pip freeze

My Python version: Python 3.7.9

anyio==2.2.0
appnope==0.1.2
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==3.3.0
cachetools==4.2.2
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
decorator==5.0.7
defusedxml==0.7.1
deprecation==2.1.0
entrypoints==0.3
fastavro==1.4.0
google-api-core==1.26.3
google-auth==1.30.0
google-cloud-bigquery-storage==2.4.0
googleapis-common-protos==1.53.0
grpcio==1.37.1
idna==2.10
importlib-metadata==4.0.1
ipykernel==5.5.4
ipython==7.23.1
ipython-genutils==0.2.0
jedi==0.18.0
Jinja2==2.11.3
json5==0.9.5
jsonschema==3.2.0
jupyter-client==6.1.12
jupyter-core==4.7.1
jupyter-packaging==0.10.1
jupyter-server==1.6.4
jupyterlab==3.0.14
jupyterlab-pygments==0.1.2
jupyterlab-server==2.5.0
libcst==0.3.18
MarkupSafe==1.1.1
matplotlib-inline==0.1.2
mistune==0.8.4
mypy-extensions==0.4.3
nbclassic==0.2.7
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
notebook==6.3.0
packaging==20.9
pandocfilters==1.4.3
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.10.1
prompt-toolkit==3.0.18
proto-plus==1.18.1
protobuf==3.16.0
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
Pygments==2.9.0
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
pyzmq==22.0.3
requests==2.25.1
rsa==4.7.2
Send2Trash==1.5.0
six==1.16.0
sniffio==1.2.0
terminado==0.9.4
testpath==0.4.4
tomlkit==0.7.0
tornado==6.1
traitlets==5.0.5
typing-extensions==3.10.0.0
typing-inspect==0.6.0
urllib3==1.26.4
wcwidth==0.2.5
webencodings==0.5.1
zipp==3.4.1
ivanmkc commented 3 years ago

Fixed by setting GRPC_DEFAULT_SSL_ROOTS_FILE_PATH envvar.

Reference: https://github.com/tensorflow/io/tree/master/tensorflow_io/bigquery

stfines-clgx commented 3 years ago

Provided link is 404, and I am running into this error

kvignesh1420 commented 3 years ago

@sfines-clgx try this link: https://github.com/tensorflow/io/blob/master/tensorflow_io/bigquery.md