keras-team / keras-nlp

Modular Natural Language Processing workflows with Keras
Apache License 2.0
733 stars 216 forks source link

kerasNLP at Kaggle #726

Closed shivance closed 1 year ago

shivance commented 1 year ago

kerasNLP library is very new and under heavy development. It differs from huggingface in its core philosophy. But I didn't find many people using it yet (maybe because it's still in 0.x.x) ?

Describe the solution you'd like Clearly it requires more attention of the community. Better and in depth tutorials is the need of hour. Apart from keras-io tutorials, Kaggle can be a good place to start publicizing the Library.

What is curated list of topics that you would like to see tutorials or wishlist ?

I'm Notebooks Master at Kaggle, and would love to write kernels on kerasNLP.

shivance commented 1 year ago

cc : @mattdangerw @jbischof @chenmoneygithub @abheesht17

mattdangerw commented 1 year ago

Thank you! And yes! We would absolutely love to publicize more on Kaggle.

We can definitely help brainstorm topics of interest here, but it might help us to get your input here on what would be a good fit for Kaggle (we are not experienced Kagglers).

Are there example NLP kernels you know of that are popular we could look at? Are there certain topics or end-to-end tasks particular of interest for kaggle users? Certain competitions we could target?

Any thoughts or pointers there appreciated!

chenmoneygithub commented 1 year ago

@shivance Thanks for your willingness for help! Kaggle adoption will be extremely important to us.

I think we can start with some text classification competition, e.g, Natural Language Processing with Disaster Tweets to kick off.

shivance commented 1 year ago

I hope this helps ! Kaggle witnesses huge traffic on notebooks, and most of them are looking to learn about various types of applications.

@chenmoneygithub Kaggle is not just a competitive data science platform anymore, it has become a learning hub for beginners to advanced ML practitioners.

It would be best for KerasNLP to attract kernel writers first (optionally we can do something similar to W&B dev expert program, wherein they sponsor the kaggle masters & gms for writing kernels with wandb ) . Once we have a concise set of tutorials that can help any beginner to understand how to use the library works, kerasNLP can host it's own NLP competition, with condition of only using kerasNLP to build models and not huggingface with attractive prize pool. (Tensorflow very often conducts kaggle competitions, with the same rule). This will attract more kaggler's towards library !

PS: This is the only notebook that appears on searching kerasnlp on kaggle, authored by @innat

innat commented 1 year ago

[from different view point] Kaggle is good place indeed, but it has one big limitation IMO, and that is it doesn't provide latest core packages. For example, see this ticket. So, it might be difficult to run keras-cv/nlp on kaggle env at the moment. The kaggle-team is working to upgrade (PR) but there are many complication.

Apart from it, if kaggle could provide organization-level public profile, then I think it would be easy to keras-team to publish code-example from it. For example, in huggingface provides organization (google, microsoft, facebook)-level profile to publish their model, spaces etc.

shivance commented 1 year ago

Well I tried installing keras-nlp in a kaggle kernel. Ran into following warnings and incompatibilities of package

Click Collecting tensorflow Downloading tensorflow-2.11.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.3/588.3 MB 1.8 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: tensorflow-hub>=0.8.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-text->keras-nlp) (0.12.0) INFO: pip is looking at multiple versions of tensorflow-text to determine which version is compatible with other requirements. This could take a while. Collecting tensorflow-text Downloading tensorflow_text-2.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.9/5.9 MB 29.8 MB/s eta 0:00:0000:0100:01m Collecting tensorflow Downloading tensorflow-2.10.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 578.1/578.1 MB 1.2 MB/s eta 0:00:0000:0100:01 Collecting tensorflow-io-gcs-filesystem>=0.23.1 Downloading tensorflow_io_gcs_filesystem-0.30.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 62.8 MB/s eta 0:00:00:00:01 Collecting tensorflow Downloading tensorflow-2.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 578.0/578.0 MB 1.3 MB/s eta 0:00:0000:0100:01 Collecting tensorflow-text Downloading tensorflow_text-2.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.6/4.6 MB 1.9 MB/s eta 0:00:00:00:0100:01 Collecting tensorflow Downloading tensorflow-2.9.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 511.8/511.8 MB 853.0 kB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.9.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 511.8/511.8 MB 1.6 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 511.7/511.7 MB 1.5 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 511.7/511.7 MB 1.5 MB/s eta 0:00:0000:0100:01 Collecting tensorflow-text Downloading tensorflow_text-2.8.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 36.3 MB/s eta 0:00:0000:01:00:01 Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (1.8.1) Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (0.6.1) Requirement already satisfied: setuptools>=41.0.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (59.8.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (0.4.6) Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (3.3.7) Requirement already satisfied: requests<3,>=2.21.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (2.28.1) Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (2.2.2) Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (1.35.0) Collecting tensorflow Downloading tensorflow-2.8.4-cp37-cp37m-manylinux2010_x86_64.whl (497.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 497.9/497.9 MB 1.9 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.8.3-cp37-cp37m-manylinux2010_x86_64.whl (497.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 497.9/497.9 MB 1.3 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.8.2-cp37-cp37m-manylinux2010_x86_64.whl (497.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 497.9/497.9 MB 1.9 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.8.1-cp37-cp37m-manylinux2010_x86_64.whl (497.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 497.9/497.9 MB 1.9 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.8.0-cp37-cp37m-manylinux2010_x86_64.whl (497.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 497.5/497.5 MB 2.4 MB/s eta 0:00:0000:0100:01 Collecting tf-estimator-nightly==2.8.0.dev2021122109 Downloading tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 462.5/462.5 kB 29.2 MB/s eta 0:00:00 Collecting tensorflow-text Downloading tensorflow_text-2.8.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 37.1 MB/s eta 0:00:0000:01:00:01 Downloading tensorflow_text-2.7.3-cp37-cp37m-manylinux2010_x86_64.whl (4.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 38.4 MB/s eta 0:00:0000:01:00:01 Collecting tensorflow Downloading tensorflow-2.7.4-cp37-cp37m-manylinux2010_x86_64.whl (495.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 495.5/495.5 MB 1.3 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.7.3-cp37-cp37m-manylinux2010_x86_64.whl (495.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 495.4/495.4 MB 1.9 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.7.2-cp37-cp37m-manylinux2010_x86_64.whl (495.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 495.4/495.4 MB 1.9 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.7.1-cp37-cp37m-manylinux2010_x86_64.whl (495.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 495.0/495.0 MB 1.6 MB/s eta 0:00:0000:0100:01 Downloading tensorflow-2.7.0-cp37-cp37m-manylinux2010_x86_64.whl (489.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 489.6/489.6 MB 2.1 MB/s eta 0:00:0000:0100:01 Collecting tensorflow-text Downloading tensorflow_text-2.7.0-cp37-cp37m-manylinux2010_x86_64.whl (4.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 34.9 MB/s eta 0:00:0000:01:00:01 Downloading tensorflow_text-2.6.0-cp37-cp37m-manylinux1_x86_64.whl (4.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 35.8 MB/s eta 0:00:0000:01:00:01 Requirement already satisfied: cached-property in /opt/conda/lib/python3.7/site-packages (from h5py~=3.1.0->tensorflow->keras-nlp) (1.5.2) Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (0.2.7) Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (4.8) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (4.2.4) Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (1.3.1) Requirement already satisfied: importlib-metadata>=4.4 in /opt/conda/lib/python3.7/site-packages (from markdown>=2.6.8->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (6.0.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (1.26.14) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (2022.12.7) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (3.3) Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (2.1.0) Requirement already satisfied: MarkupSafe>=2.1.1 in /opt/conda/lib/python3.7/site-packages (from werkzeug>=0.11.15->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (2.1.2) Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (3.8.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (0.4.8) Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.7,>=2.6.0->tensorflow->keras-nlp) (3.2.0) Installing collected packages: typing-extensions, numpy, h5py, tensorflow-text, keras-nlp Attempting uninstall: typing-extensions Found existing installation: typing_extensions 4.1.1 Uninstalling typing_extensions-4.1.1: Successfully uninstalled typing_extensions-4.1.1 Attempting uninstall: numpy Found existing installation: numpy 1.21.6 Uninstalling numpy-1.21.6: Successfully uninstalled numpy-1.21.6 Attempting uninstall: h5py Found existing installation: h5py 3.8.0 Uninstalling h5py-3.8.0: Successfully uninstalled h5py-3.8.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, which is not installed. beatrix-jupyterlab 3.1.7 requires google-cloud-bigquery-storage, which is not installed. xarray-einstats 0.2.2 requires numpy>=1.21, but you have numpy 1.19.5 which is incompatible. tfx-bsl 1.9.0 requires google-api-python-client<2,>=1.7.11, but you have google-api-python-client 2.52.0 which is incompatible. tfx-bsl 1.9.0 requires pyarrow<6,>=1, but you have pyarrow 8.0.0 which is incompatible. tfx-bsl 1.9.0 requires tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,<3,>=1.15.5, but you have tensorflow 2.6.4 which is incompatible. tensorflow-transform 1.9.0 requires pyarrow<6,>=1, but you have pyarrow 8.0.0 which is incompatible. tensorflow-transform 1.9.0 requires tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,<2.10,>=1.15.5, but you have tensorflow 2.6.4 which is incompatible. tensorflow-serving-api 2.9.0 requires tensorflow<3,>=2.9.0, but you have tensorflow 2.6.4 which is incompatible. rich 12.6.0 requires typing-extensions<5.0,>=4.0.0; python_version < "3.9", but you have typing-extensions 3.10.0.2 which is incompatible. pytorch-lightning 1.9.0 requires typing-extensions>=4.0.0, but you have typing-extensions 3.10.0.2 which is incompatible. pytoolconfig 1.2.5 requires typing-extensions>=4.4.0; python_version < "3.8", but you have typing-extensions 3.10.0.2 which is incompatible. pandas-profiling 3.1.0 requires markupsafe~=2.0.1, but you have markupsafe 2.1.2 which is incompatible. ortools 9.5.2237 requires protobuf>=4.21.5, but you have protobuf 3.19.4 which is incompatible. nnabla 1.33.0 requires numpy~=1.21.0, but you have numpy 1.19.5 which is incompatible. jaxlib 0.3.25 requires numpy>=1.20, but you have numpy 1.19.5 which is incompatible. jax 0.3.25 requires numpy>=1.20, but you have numpy 1.19.5 which is incompatible. imbalanced-learn 0.10.1 requires joblib>=1.1.1, but you have joblib 1.0.1 which is incompatible. ibis-framework 2.1.1 requires importlib-metadata<5,>=4; python_version < "3.8", but you have importlib-metadata 6.0.0 which is incompatible. flax 0.6.4 requires typing-extensions>=4.1.1, but you have typing-extensions 3.10.0.2 which is incompatible. flake8 5.0.4 requires importlib-metadata<4.3,>=1.1.0; python_version < "3.8", but you have importlib-metadata 6.0.0 which is incompatible. featuretools 1.11.1 requires numpy>=1.21.0, but you have numpy 1.19.5 which is incompatible. cmudict 1.0.13 requires importlib-metadata<6.0.0,>=5.1.0, but you have importlib-metadata 6.0.0 which is incompatible. cmdstanpy 1.1.0 requires numpy>=1.21, but you have numpy 1.19.5 which is incompatible. astroid 2.13.3 requires typing-extensions>=4.0.0; python_version < "3.11", but you have typing-extensions 3.10.0.2 which is incompatible. apache-beam 2.40.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.6 which is incompatible. apache-beam 2.40.0 requires pyarrow<8.0.0,>=0.15.1, but you have pyarrow 8.0.0 which is incompatible. allennlp 2.10.1 requires h5py>=3.6.0, but you have h5py 3.1.0 which is incompatible. allennlp 2.10.1 requires numpy>=1.21.4, but you have numpy 1.19.5 which is incompatible. aioitertools 0.11.0 requires typing_extensions>=4.0; python_version < "3.10", but you have typing-extensions 3.10.0.2 which is incompatible. aiobotocore 2.4.2 requires botocore<1.27.60,>=1.27.59, but you have botocore 1.29.60 which is incompatible. Successfully installed h5py-3.1.0 keras-nlp-0.3.1 numpy-1.19.5 tensorflow-text-2.6.0 typing-extensions-3.10.0.2 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: [https://pip.pypa.io/warnings/venv](https://pip.pypa.io/warnings/venv%3C/span%3E%3Cspan)[ class="ansi-yellow-fg">](https://pip.pypa.io/warnings/venv%3C/span%3E%3Cspan)

In short, kaggle downloaded all tensorflow version starting from 2.10 to 2.6, just for dependency match. Even after successfully installing keras-nlp, it still raises error in red.

abheesht17 commented 1 year ago

@shivance, try this:

!pip install git+https://github.com/keras-team/keras-nlp.git tensorflow tensorflow-text --upgrade --quiet
shivance commented 1 year ago

Installing with git raises python version error

Error ERROR: Package 'keras-nlp' requires a different Python: 3.7.12 not in '>=3.8'
abheesht17 commented 1 year ago

@shivance, check this out: https://www.kaggle.com/code/penstrokes75/kerasnlp-installation/notebook. Upgrading Python solves the issue :)

shivance commented 1 year ago

Cool, thanks @abheesht17 , I'll start working on text classification example right away !

shivance commented 1 year ago

Ancient python 3.7 is turning out to be a big issue, using the conda environment installs keras-nlp 0.4, but when we import it shows version as 0.3.1. No luck with installation from pip+git...

The main issue is keras_nlp has no attribute models, when I import :from keras_nlp.models import BertBackbone

Weirdest thing is: image

PS: This is when conda env activated with python 3.10 PPS: Having looked at [this] makes me feel, official python upgrade is only solution

abheesht17 commented 1 year ago

Lol, Kaggle is a real pain, eh?

Are you sure you are using the correct pip? Try passing the complete pip path, maybe?

!<pip-path> <libraries>
shivance commented 1 year ago

Through workarounds I'm able to install kerasNLP for python 3.8 pip , but the running kernel won't use python 3.8 only (lol) , even tho it says python version 3.7 This gives import error

abheesht17 commented 1 year ago

Yeah. I tried a bunch of stuff - the best we can do is run scripts like this (after installing a newer version of Python):

python 3.9 <script-name>

This works fine.

But the cells themselves end up using the original Python version (3.7.x). We can possibly ask the Kaggle team to upgrade it (3.7.x is pretty old now), but I saw that many people have already asked them to do, and they haven't yet done it. So, dunno about the ETA ;_;

shivance commented 1 year ago

Even if running script through Kaggle notebook would have used python 3.9 , IMO, that defeats the purpose of notebook creations on Kaggle

Beginners tend to move towards jupyter notebooks rather than scripts

I hope this helps ! Kaggle witnesses huge traffic on notebooks, and most of them are looking to learn about various types ... @chenmoneygithub Kaggle is not just a competitive data science platform anymore, it has become a learning hub for beginners to advanced ML practitioners.

Given the ETA of python upgrade, as @innat said, it's difficult for us to target Kaggle now. How about contributing tutorials to keras-io @mattdangerw ?

abheesht17 commented 1 year ago

@shivance, this is an easy solution, we should have though of it earlier, lol. Hacky, but it works: https://www.kaggle.com/code/penstrokes75/kerasnlp-hacky-installation/notebook

I was able to import KerasNLP, load BERT from a preset and load BertPreprocessor as well.

shivance commented 1 year ago

That's a clever one @abheesht17. Weird thing is I tried exporting to path as well, but /opt/conda/kerasEnv/..keras-nlp package path.

It did not work for me.

abheesht17 commented 1 year ago

Yeah, exporting to path doesn't work, unfortunately. But sys.path.append does the job :D

shivance commented 1 year ago

[additional info] : exporting to path with os.environ didn't work as well

abheesht17 commented 1 year ago

Just to check, ran this example on Kaggle: https://keras.io/examples/nlp/fnet_classification_with_keras_nlp/. It worked fine (accuracy, training time all look OK): https://www.kaggle.com/code/penstrokes75/text-classification-with-fnet/notebook

innat commented 1 year ago

@abheesht17 be aware the force update in kaggle env for some package (i.e. tf), unlike colab env. Sometimes it may work, and sometimes not (quote), which may lead bad UX.

Apart from that, IMO, to reach kaggle community, someone just need to publish high scored kernel for an onging competition (1st condition). 2nd, others (flexibility, reproducible etc). Same goes for KerasCV too.

jbischof commented 1 year ago

TF 2.11 doesn't even support 3.7. Ridiculous!

innat commented 1 year ago

What is the minimum tensorflow's version require for keras-nlp? Is it 2.11? If I remember correctly, for keras-cv it was 2.9. This PR is merged, so if keras-nlp/cv is ready, we can make a feature equest to add them in kaggle env officially.

cc. @LukeWood @djherbis

mattdangerw commented 1 year ago

What is the minimum tensorflow's version require for keras-nlp?

We should work with 2.9 and on. Quite possibly with earlier versions, but I can't say we have tested with older tf than 2.9.

innat commented 1 year ago

FYI, tf 2.11 is merged and hopefully soon available in keras env. https://github.com/Kaggle/docker-python/pull/1216

LukeWood commented 1 year ago

I believe we depend on 2.10, but I’m not 100% sure.

jbischof commented 1 year ago

@LukeWood the library throws without 2.11 (link).

shivance commented 1 year ago

Irrespective of which tensorflow we depend on we still need python>=3.8 to run the library's latest version 0.4 on Kaggle image

cc: @jbischof @LukeWood

shivance commented 1 year ago

Irrespective of which tensorflow we depend on we still need python>=3.8 to run the library's latest version 0.4 on Kaggle image

cc: @jbischof @LukeWood

LukeWood commented 1 year ago

@LukeWood the library throws without 2.11 (link).

Gah— that’s right. We bumped it to allow StableDiffusion to use the keras core group normalization

abheesht17 commented 1 year ago

@abheesht17 be aware the force update in kaggle env for some package (i.e. tf), unlike colab env. Sometimes it may work, and sometimes not (quote), which may lead bad UX.

Apart from that, IMO, to reach kaggle community, someone just need to publish high scored kernel for an onging competition (1st condition). 2nd, others (flexibility, reproducible etc). Same goes for KerasCV too.

Whoops, just saw the rest of this thread. Yep, I agree with you. The hacky solution was never meant to be a permanent/failsafe solution, just a quick solution so that we could start working on a few examples and publish them on Kaggle. "Text Classification with FNet" worked...and I think most of our initial examples will be sequence classification-related. We might run into issues with seq-to-seq tasks though.

P.S.: In the example, we compare FNet with a vanilla Transformer model. This implies that the MHA layer works fine with TF 2.<whatever-version-is-on-Kaggle-I-don't-remember-now>. Since MHA is the backbone on which almost all KerasNLP models rely, I am assuming they will work fine as well.

P.P.S: One important thing we might want to check is whether our preset models work fine.

P.P.P.S: Just saw your comment on TF 2.11 being merged. Woohoo :)

shivance commented 1 year ago

image :)

shivance commented 1 year ago

@mattdangerw @jbischof @abheesht17 @innat I'm working on Semantic Similarity with BERT, which is an official keras-io tutorial but built with Huggingface.

I'm trying to reproduce it with Keras NLP. Here is the notebook. Couple of extra things I am trying on :

  1. Reduce need of custom dataset generator, by replacing it with tensorflow data API
  2. Directly using BertClassifier model of KerasNLP, from preset.

I faced this weird IndexError upon calling model.fit(), could you please take a look ?

image

PS : Detailed Error 150.3s | 96 | StagingError Traceback (most recent call last) -- | -- | -- 150.3s | 97 | /tmp/ipykernel_21/4064585076.py in 150.3s | 98 | ----> 1 history = model.fit(train_data, validation_data=valid_data, epochs=2) 150.3s | 99 |   150.3s | 100 | /kaggle/working/keras-nlp/keras_nlp/utils/pipeline_model.py in fit(self, x, y, batch_size, sample_weight, validation_data, validation_split, **kwargs) 150.3s | 101 | 195 sample_weight=None, 150.3s | 102 | 196 validation_data=validation_data, 150.3s | 103 | --> 197 **kwargs, 150.3s | 104 | 198 ) 150.3s | 105 | 199 150.3s | 106 |   150.3s | 107 | /opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs) 150.3s | 108 | 68 # To get the full stack trace, call: 150.3s | 109 | 69 # `tf.debugging.disable_traceback_filtering()` 150.3s | 110 | ---> 70 raise e.with_traceback(filtered_tb) from None 150.3s | 111 | 71 finally: 150.3s | 112 | 72 del filtered_tb 150.3s | 113 |   150.3s | 114 | /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs) 150.3s | 115 | 1267 except Exception as e: # pylint:disable=broad-except 150.3s | 116 | 1268 if hasattr(e, "ag_error_metadata"): 150.3s | 117 | -> 1269 raise e.ag_error_metadata.to_exception(e) 150.3s | 118 | 1270 else: 150.3s | 119 | 1271 raise 150.3s | 120 |   150.3s | 121 | StagingError: in user code: 150.3s | 122 |   150.3s | 123 | File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1249, in train_function * 150.3s | 124 | return step_function(self, iterator) 150.3s | 125 | File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1233, in step_function ** 150.3s | 126 | outputs = model.distribute_strategy.run(run_step, args=(data,)) 150.3s | 127 | File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1222, in run_step ** 150.3s | 128 | outputs = model.train_step(data) 150.3s | 129 | File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1023, in train_step 150.3s | 130 | y_pred = self(x, training=True) 150.3s | 131 | File "/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler 150.3s | 132 | raise e.with_traceback(filtered_tb) from None 150.3s | 133 | File "/opt/conda/lib/python3.7/site-packages/keras/layers/normalization/layer_normalization.py", line 268, in call 150.3s | 134 | broadcast_shape[dim] = input_shape.dims[dim].value 150.3s | 135 |   150.3s | 136 | IndexError: Exception encountered when calling layer 'embeddings_layer_norm' (type LayerNormalization). 150.3s | 137 |   150.3s | 138 | list index out of range 150.3s | 139 |   150.3s | 140 | Call arguments received by layer 'embeddings_layer_norm' (type LayerNormalization): 150.3s | 141 | • inputs=tf.Tensor(shape=(512, 128), dtype=float32)
innat commented 1 year ago

Have you ensured the version of relevant packages (tf, transformer, etc.)?

shivance commented 1 year ago

I'm not using transformers. And I don't think that this should be due to some version mismatch, anyways let me confirm it by running it on local !

shivance commented 1 year ago

@innat I verified it on my dev environment, the same error is being raised, so it's not the issue of version mismatch.

shivance commented 1 year ago

@mattdangerw can you help ?

shivance commented 1 year ago

Hey I fixed it up ! The notebook is ready, I'm thinking of opening a PR in keras-io

https://www.kaggle.com/shivanshuman/semantic-similarity-with-bert

PS: Opened a PR on keras-io

innat commented 1 year ago

@shivance [to inform], tf 2.11 is available now in kaggle env. You may not need to install tensorflow.

shivance commented 1 year ago

@shivance [to inform], tf 2.11 is available now in kaggle env. You may not need to install tensorflow.

By default, for me, it showed tf 2.9, as far as I remembered on what we discussed, we are using tf>=2.9 now for keras nlp

innat commented 1 year ago

By default, for me, it showed tf 2.9

Make sure you set 'always use latest environment`.

image

shivance commented 1 year ago

Thank you! And yes! We would absolutely love to publicize more on Kaggle.

image

@mattdangerw Looks like it has already started 😄

djherbis commented 1 year ago

We're working on an upgrade to py3.10: https://github.com/Kaggle/docker-python/pull/1231

May still take some time before this is publicly available on Kaggle but just a heads up.

shivance commented 1 year ago

Thanks @djherbis , Python version has been bumped to 3.10 on Kaggle with public announcement on discussions 7hrs ago. Now we can finally stop using the workaround for KerasNLP on Kaggle (thanks @abheesht17 for it while we needed keras-nlp to run on Kaggle with 3.7).

@mattdangerw do you think we should put a PR to Kaggle for pre-installed KerasNLP ( it will be adding a pip install keras-nlp on Kaggle docker images's requirements file)

Cheers !