Closed shivance closed 1 year ago
cc : @mattdangerw @jbischof @chenmoneygithub @abheesht17
Thank you! And yes! We would absolutely love to publicize more on Kaggle.
We can definitely help brainstorm topics of interest here, but it might help us to get your input here on what would be a good fit for Kaggle (we are not experienced Kagglers).
Are there example NLP kernels you know of that are popular we could look at? Are there certain topics or end-to-end tasks particular of interest for kaggle users? Certain competitions we could target?
Any thoughts or pointers there appreciated!
@shivance Thanks for your willingness for help! Kaggle adoption will be extremely important to us.
I think we can start with some text classification competition, e.g, Natural Language Processing with Disaster Tweets to kick off.
I hope this helps ! Kaggle witnesses huge traffic on notebooks, and most of them are looking to learn about various types of applications.
@chenmoneygithub Kaggle is not just a competitive data science platform anymore, it has become a learning hub for beginners to advanced ML practitioners.
It would be best for KerasNLP to attract kernel writers first (optionally we can do something similar to W&B dev expert program, wherein they sponsor the kaggle masters & gms for writing kernels with wandb ) . Once we have a concise set of tutorials that can help any beginner to understand how to use the library works, kerasNLP can host it's own NLP competition, with condition of only using kerasNLP to build models and not huggingface with attractive prize pool. (Tensorflow very often conducts kaggle competitions, with the same rule). This will attract more kaggler's towards library !
PS: This is the only notebook that appears on searching kerasnlp on kaggle, authored by @innat
[from different view point] Kaggle is good place indeed, but it has one big limitation IMO, and that is it doesn't provide latest core packages. For example, see this ticket. So, it might be difficult to run keras-cv/nlp on kaggle env at the moment. The kaggle-team is working to upgrade (PR) but there are many complication.
Apart from it, if kaggle could provide organization-level public profile, then I think it would be easy to keras-team to publish code-example from it. For example, in huggingface provides organization (google, microsoft, facebook)-level profile to publish their model, spaces etc.
Well I tried installing keras-nlp in a kaggle kernel. Ran into following warnings and incompatibilities of package
In short, kaggle downloaded all tensorflow version starting from 2.10 to 2.6, just for dependency match. Even after successfully installing keras-nlp, it still raises error in red.
@shivance, try this:
!pip install git+https://github.com/keras-team/keras-nlp.git tensorflow tensorflow-text --upgrade --quiet
Installing with git raises python version error
@shivance, check this out: https://www.kaggle.com/code/penstrokes75/kerasnlp-installation/notebook. Upgrading Python solves the issue :)
Cool, thanks @abheesht17 , I'll start working on text classification example right away !
Ancient python 3.7 is turning out to be a big issue, using the conda environment installs keras-nlp 0.4, but when we import it shows version as 0.3.1. No luck with installation from pip+git...
The main issue is keras_nlp has no attribute models, when I import :
from keras_nlp.models import BertBackbone
Weirdest thing is:
PS: This is when conda env activated with python 3.10 PPS: Having looked at [this] makes me feel, official python upgrade is only solution
Lol, Kaggle is a real pain, eh?
Are you sure you are using the correct pip? Try passing the complete pip path, maybe?
!<pip-path> <libraries>
Through workarounds I'm able to install kerasNLP for python 3.8 pip , but the running kernel won't use python 3.8 only (lol) , even tho it says python version 3.7 This gives import error
Yeah. I tried a bunch of stuff - the best we can do is run scripts like this (after installing a newer version of Python):
python 3.9 <script-name>
This works fine.
But the cells themselves end up using the original Python version (3.7.x). We can possibly ask the Kaggle team to upgrade it (3.7.x is pretty old now), but I saw that many people have already asked them to do, and they haven't yet done it. So, dunno about the ETA ;_;
Even if running script through Kaggle notebook would have used python 3.9 , IMO, that defeats the purpose of notebook creations on Kaggle
Beginners tend to move towards jupyter notebooks rather than scripts
I hope this helps ! Kaggle witnesses huge traffic on notebooks, and most of them are looking to learn about various types ... @chenmoneygithub Kaggle is not just a competitive data science platform anymore, it has become a learning hub for beginners to advanced ML practitioners.
Given the ETA of python upgrade, as @innat said, it's difficult for us to target Kaggle now. How about contributing tutorials to keras-io @mattdangerw ?
@shivance, this is an easy solution, we should have though of it earlier, lol. Hacky, but it works: https://www.kaggle.com/code/penstrokes75/kerasnlp-hacky-installation/notebook
I was able to import KerasNLP, load BERT from a preset and load BertPreprocessor
as well.
That's a clever one @abheesht17. Weird thing is I tried exporting to path as well, but /opt/conda/kerasEnv/..keras-nlp package path.
It did not work for me.
Yeah, exporting to path doesn't work, unfortunately. But sys.path.append
does the job :D
[additional info] : exporting to path with os.environ didn't work as well
Just to check, ran this example on Kaggle: https://keras.io/examples/nlp/fnet_classification_with_keras_nlp/. It worked fine (accuracy, training time all look OK): https://www.kaggle.com/code/penstrokes75/text-classification-with-fnet/notebook
@abheesht17 be aware the force update in kaggle env for some package (i.e. tf), unlike colab env. Sometimes it may work, and sometimes not (quote), which may lead bad UX.
Apart from that, IMO, to reach kaggle community, someone just need to publish high scored kernel for an onging competition (1st condition). 2nd, others (flexibility, reproducible etc). Same goes for KerasCV too.
TF 2.11 doesn't even support 3.7. Ridiculous!
What is the minimum tensorflow's version require for keras-nlp? Is it 2.11? If I remember correctly, for keras-cv it was 2.9. This PR is merged, so if keras-nlp/cv is ready, we can make a feature equest to add them in kaggle env officially.
cc. @LukeWood @djherbis
What is the minimum tensorflow's version require for keras-nlp?
We should work with 2.9 and on. Quite possibly with earlier versions, but I can't say we have tested with older tf than 2.9.
FYI, tf 2.11 is merged and hopefully soon available in keras env. https://github.com/Kaggle/docker-python/pull/1216
I believe we depend on 2.10, but I’m not 100% sure.
Irrespective of which tensorflow we depend on we still need python>=3.8 to run the library's latest version 0.4 on Kaggle
cc: @jbischof @LukeWood
Irrespective of which tensorflow we depend on we still need python>=3.8 to run the library's latest version 0.4 on Kaggle
cc: @jbischof @LukeWood
@LukeWood the library throws without 2.11 (link).
Gah— that’s right. We bumped it to allow StableDiffusion to use the keras core group normalization
@abheesht17 be aware the force update in kaggle env for some package (i.e. tf), unlike colab env. Sometimes it may work, and sometimes not (quote), which may lead bad UX.
Apart from that, IMO, to reach kaggle community, someone just need to publish high scored kernel for an onging competition (1st condition). 2nd, others (flexibility, reproducible etc). Same goes for KerasCV too.
Whoops, just saw the rest of this thread. Yep, I agree with you. The hacky solution was never meant to be a permanent/failsafe solution, just a quick solution so that we could start working on a few examples and publish them on Kaggle. "Text Classification with FNet" worked...and I think most of our initial examples will be sequence classification-related. We might run into issues with seq-to-seq tasks though.
P.S.: In the example, we compare FNet with a vanilla Transformer model. This implies that the MHA layer works fine with TF 2.<whatever-version-is-on-Kaggle-I-don't-remember-now>. Since MHA is the backbone on which almost all KerasNLP models rely, I am assuming they will work fine as well.
P.P.S: One important thing we might want to check is whether our preset models work fine.
P.P.P.S: Just saw your comment on TF 2.11 being merged. Woohoo :)
:)
@mattdangerw @jbischof @abheesht17 @innat I'm working on Semantic Similarity with BERT, which is an official keras-io tutorial but built with Huggingface.
I'm trying to reproduce it with Keras NLP. Here is the notebook. Couple of extra things I am trying on :
I faced this weird IndexError upon calling model.fit(), could you please take a look ?
Have you ensured the version of relevant packages (tf, transformer, etc.)?
I'm not using transformers. And I don't think that this should be due to some version mismatch, anyways let me confirm it by running it on local !
@innat I verified it on my dev environment, the same error is being raised, so it's not the issue of version mismatch.
@mattdangerw can you help ?
Hey I fixed it up ! The notebook is ready, I'm thinking of opening a PR in keras-io
https://www.kaggle.com/shivanshuman/semantic-similarity-with-bert
PS: Opened a PR on keras-io
@shivance [to inform], tf 2.11 is available now in kaggle env. You may not need to install tensorflow.
@shivance [to inform], tf 2.11 is available now in kaggle env. You may not need to install tensorflow.
By default, for me, it showed tf 2.9, as far as I remembered on what we discussed, we are using tf>=2.9 now for keras nlp
By default, for me, it showed tf 2.9
Make sure you set 'always use latest environment`.
Thank you! And yes! We would absolutely love to publicize more on Kaggle.
@mattdangerw Looks like it has already started 😄
We're working on an upgrade to py3.10: https://github.com/Kaggle/docker-python/pull/1231
May still take some time before this is publicly available on Kaggle but just a heads up.
Thanks @djherbis , Python version has been bumped to 3.10 on Kaggle with public announcement on discussions 7hrs ago. Now we can finally stop using the workaround for KerasNLP on Kaggle (thanks @abheesht17 for it while we needed keras-nlp to run on Kaggle with 3.7).
@mattdangerw do you think we should put a PR to Kaggle for pre-installed KerasNLP ( it will be adding a pip install keras-nlp on Kaggle docker images's requirements file)
Cheers !
kerasNLP library is very new and under heavy development. It differs from huggingface in its core philosophy. But I didn't find many people using it yet (maybe because it's still in 0.x.x) ?
Describe the solution you'd like Clearly it requires more attention of the community. Better and in depth tutorials is the need of hour. Apart from keras-io tutorials, Kaggle can be a good place to start publicizing the Library.
What is curated list of topics that you would like to see tutorials or wishlist ?
I'm Notebooks Master at Kaggle, and would love to write kernels on kerasNLP.