nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.89k stars 1.22k forks source link

PEGASUS model conflicts with protobuf #105

Closed JamesCHub closed 1 year ago

JamesCHub commented 1 year ago

Information

The problem arises in chapter:

Describe the bug

There's some sort of incompatibility between the google/pegasus-cnn_dailymail model and protobuf - when you try to load the model, this error is drawn:

Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Attempts to downgrade the protobuf package (as suggested in 1. above) predictably broke everything.

HOWEVER, the second suggestion did work. Just add the following at the top of the Notebook and all is well

import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"]="python"

(contrary to the suggestion, I did not notice any significant slowdown)

This is an environment I created myself on a local system grpcio-status-1.54.2 protobuf-4.23.2

Full freeze output: absl-py==1.4.0 accelerate==0.5.1 aiohttp==3.8.4 aiosignal==1.3.1 alembic==1.11.1 appdirs==1.4.4 asttokens==2.2.1 astunparse==1.6.3 async-timeout==4.0.2 attrs==23.1.0 audioread==3.0.0 backcall==0.2.0 bertviz==1.2.0 boto3==1.26.136 botocore==1.29.136 Brotli==1.0.9 cachetools==5.3.0 certifi==2023.5.7 cffi==1.15.1 charset-normalizer==3.1.0 click==8.1.3 cmaes==0.9.1 cmake==3.26.3 coloredlogs==15.0.1 colorlog==6.7.0 comm==0.1.3 contourpy==1.0.7 cycler==0.11.0 datasets==1.16.1 debugpy==1.6.7 decorator==5.1.1 dill==0.3.6 executing==1.2.0 filelock==3.12.0 fire==0.5.0 flatbuffers==23.5.9 fonttools==4.39.4 frozenlist==1.3.3 fsspec==2023.5.0 gast==0.4.0 google-auth==2.18.1 google-auth-oauthlib==1.0.0 google-pasta==0.2.0 googleapis-common-protos==1.59.0 GPUtil==1.4.0 greenlet==2.0.2 grpcio==1.54.2 grpcio-status==1.54.2 h5py==3.8.0 huggingface-hub==0.14.1 humanfriendly==10.0 idna==3.4 importlib-metadata==6.6.0 importlib-resources==5.12.0 inflate64==0.3.1 ipykernel==6.23.1 ipython==8.13.2 ipywidgets==8.0.6 jax==0.4.10 jedi==0.18.2 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.2.0 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyterlab-widgets==3.0.7 keras==2.12.0 keras2onnx==1.7.0 kiwisolver==1.4.4 lazy_loader==0.2 libclang==16.0.0 librosa==0.10.0.post2 lit==16.0.5 llvmlite==0.40.0 Mako==1.2.4 Markdown==3.4.3 MarkupSafe==2.1.2 matplotlib==3.7.1 matplotlib-inline==0.1.6 ml-dtypes==0.1.0 mpmath==1.3.0 msgpack==1.0.5 multidict==6.0.4 multiprocess==0.70.14 multivolumefile==0.2.3 nest-asyncio==1.5.6 networkx==3.1 nlpaug==1.1.7 nltk==3.6.6 numba==0.57.0 numpy==1.23.5 nvidia-cublas-cu11==11.10.3.66 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.5.0.96 nvidia-cudnn-cu12==8.9.1.23 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 nvidia-tensorrt==99.0.0 oauthlib==3.2.2 onnx==1.14.0 onnxconverter-common==1.13.0 onnxruntime==1.14.1 onnxruntime-tools==1.7.0 opt-einsum==3.3.0 optuna==3.1.1 packaging==23.1 pandas==2.0.1 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 platformdirs==3.5.1 pooch==1.6.0 portalocker==2.0.0 prompt-toolkit==3.0.38 protobuf==4.23.2 psutil==5.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 py3nvml==0.2.7 py7zr==0.20.5 pyarrow==12.0.0 pyasn1==0.5.0 pyasn1-modules==0.3.0 pybcj==1.0.1 pycparser==2.21 pycryptodomex==3.18.0 Pygments==2.15.1 pynndescent==0.5.10 pynvml==11.5.0 pyparsing==3.0.9 pyppmd==1.0.0 python-dateutil==2.8.2 pytz==2023.3 PyYAML==6.0 pyzmq==25.0.2 pyzstd==0.15.7 regex==2023.5.5 requests==2.30.0 requests-oauthlib==1.3.1 rouge-score==0.0.4 rsa==4.9 s3transfer==0.6.1 sacrebleu==1.5.1 sacremoses==0.0.53 scikit-learn==1.2.2 scikit-multilearn==0.2.0 scipy==1.10.1 sentencepiece==0.1.99 seqeval==1.2.2 six==1.16.0 soundfile==0.12.1 soxr==0.3.5 SQLAlchemy==2.0.14 stack-data==0.6.2 sympy==1.12 tensorboard==2.12.3 tensorboard-data-server==0.7.0 tensorflow==2.12.0 tensorflow-estimator==2.12.0 tensorflow-io-gcs-filesystem==0.32.0 tensorrt==8.6.1 tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1 termcolor==2.3.0 texttable==1.6.7 threadpoolctl==3.1.0 tokenizers==0.10.3 torch==2.0.1 torch-scatter==2.1.1+pt20cu117 tornado==6.3.2 tqdm==4.65.0 traitlets==5.9.0 transformers==4.11.3 triton==2.0.0 typing_extensions==4.5.0 tzdata==2023.3 umap-learn==0.5.1 urllib3==1.26.15 wcwidth==0.2.6 Werkzeug==2.3.4 widgetsnbextension==4.0.7 wrapt==1.14.1 xmltodict==0.13.0 xxhash==3.2.0 yarl==1.9.2 zipp==3.15.0

GPU: RTX2060 Super

JamesCHub commented 1 year ago

Just add the following at the top of the Notebook and all is well

import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"]="python"