Token Classification, also known as Named Entity Recognition (NER), is a highly popular and intriguing problem in the field of Natural Language Processing (NLP). It would greatly benefit the community to have an example on keras.io utilizing KerasNLP.
While there is an existing example titled Named Entity Recognition with Transformers, it employs a simplistic approach to NER. For instance, instead of utilizing a tokenizer, it employs StringLookup, and rather than utilizing a pre-trained model, it implements a basic Transformer model. While this tutorial is informative, it may not yield competitive performance.
I've recently published a notebook on Kaggle's "PII Data Detection" competition demonstrating the token classification task from scratch with KerasNLP which achieves very competitive performance. I would love to add it in keras.io. As this notebook was created for Kaggle competition, I'm open to suggestions to make it more suitable for keras.io.
I think this notebook can also serve as a potential solution to https://github.com/keras-team/keras-nlp/issues/927 as I implemented components for token classification with KerasNLP that was discussed there.
Issue Type
Feature Request
Keras Version
Keras 3
Current Behavior?
Token Classification, also known as Named Entity Recognition (NER), is a highly popular and intriguing problem in the field of Natural Language Processing (NLP). It would greatly benefit the community to have an example on
keras.io
utilizing KerasNLP.While there is an existing example titled Named Entity Recognition with Transformers, it employs a simplistic approach to NER. For instance, instead of utilizing a
tokenizer
, it employsStringLookup
, and rather than utilizing a pre-trained model, it implements a basicTransformer
model. While this tutorial is informative, it may not yield competitive performance.Also, there was a discussion about this on https://github.com/keras-team/keras-io/pull/1291 and https://github.com/keras-team/keras-nlp/issues/927 almost a year ago, which suggested some changes to KerasNLP to include a token classification example. But those issues were closed.
Tutorial
I've recently published a notebook on Kaggle's "PII Data Detection" competition demonstrating the token classification task from scratch with KerasNLP which achieves very competitive performance. I would love to add it in
keras.io
. As this notebook was created for Kaggle competition, I'm open to suggestions to make it more suitable forkeras.io
.I think this notebook can also serve as a potential solution to https://github.com/keras-team/keras-nlp/issues/927 as I implemented components for token classification with KerasNLP that was discussed there.
Relevant log output
No response