Bump spark-nlp from 3.1.1 to 3.3.0 in /server

Bumps spark-nlp from 3.1.1 to 3.3.0.

Release notes

John Snow Labs Spark-NLP 3.3.0: New ALBERT, XLNet, RoBERTa, XLM-RoBERTa, and Longformer for Token Classification, 50x times faster to save models, new ways to discover pretrained models and pipelines, new state-of-the-art models, and lots more!

Overview

We are very excited to release Spark NLP 🚀 3.3.0! This release comes with new ALBERT, XLNet, RoBERTa, XLM-RoBERTa, and Longformer existing or fine-tuned models for Token Classification on HuggingFace 🤗 , up to 50x times faster saving Spark NLP models & pipelines, no more 2G limitation for the size of imported TensorFlow models, lots of new functions to filter and display pretrained models & pipelines inside Spark NLP, bug fixes, and more!

We are proud to say Spark NLP 3.3.0 is still compatible across all major releases of Apache Spark used locally, by all Cloud providers such as EMR, and all managed services such as Databricks. The major releases of Apache Spark include Apache Spark 3.0.x/3.1.x (spark-nlp), Apache Spark 2.4.x (spark-nlp-spark24), and Apache Spark 2.3.x (spark-nlp-spark23).

As always, we would like to thank our community for their feedback, questions, and feature requests.

Major features and improvements

NEW: Starting Spark NLP 3.3.0 release there will be no limitation of size when you import TensorFlow models! You can now import TF Hub & HuggingFace models larger than 2 Gigabytes of size.

NEW: Up to 50x faster saving Spark NLP models and pipelines! We have improved the way we package TensorFlow SavedModel while saving Spark NLP models & pipelines. For instance, it used to take up to 10 minutes to save the xlm_roberta_base model before Spark NLP 3.3.0, and now it only takes up to 15 seconds!

NEW: Introducing AlbertForTokenClassification annotator in Spark NLP 🚀. AlbertForTokenClassification can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using AlbertForTokenClassification or TFAlbertForTokenClassification in HuggingFace 🤗

NEW: Introducing XlnetForTokenClassification annotator in Spark NLP 🚀. XlnetForTokenClassification can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using XLNetForTokenClassificationet or TFXLNetForTokenClassificationet in HuggingFace 🤗

NEW: Introducing RoBertaForTokenClassification annotator in Spark NLP 🚀. RoBertaForTokenClassification can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using RobertaForTokenClassification or TFRobertaForTokenClassification in HuggingFace 🤗

NEW: Introducing XlmRoBertaForTokenClassification annotator in Spark NLP 🚀. XlmRoBertaForTokenClassification can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using XLMRobertaForTokenClassification or TFXLMRobertaForTokenClassification in HuggingFace 🤗

NEW: Introducing LongformerForTokenClassification annotator in Spark NLP 🚀. LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using LongformerForTokenClassification or TFLongformerForTokenClassification in HuggingFace 🤗

NEW: Introducing new ResourceDownloader functions to easily look for pretrained models & pipelines inside Spark NLP (Python and Scala). You can filter models or pipelines via language, version, or the name of the annotator
from sparknlp.pretrained import *
display and filter all available pretrained pipelines
ResourceDownloader.showPublicPipelines()
ResourceDownloader.showPublicPipelines(lang="en")
ResourceDownloader.showPublicPipelines(lang="en", version="3.2.0")
display and filter all available pretrained pipelines
ResourceDownloader.showPublicModels()
ResourceDownloader.showPublicModels("NerDLModel", "3.2.0")
ResourceDownloader.showPublicModels("NerDLModel", "en")
ResourceDownloader.showPublicModels("XlmRoBertaEmbeddings", "xx")
+--------------------------+------+---------+
| Model                    | lang | version |
+--------------------------+------+---------+
| xlm_roberta_base         |  xx  | 3.1.0   |
| twitter_xlm_roberta_base |  xx  | 3.1.0   |
| xlm_roberta_xtreme_base  |  xx  | 3.1.3   |
| xlm_roberta_large        |  xx  | 3.3.0   |
+--------------------------+------+---------+
remove all the downloaded models & pipelines to free up storage
ResourceDownloader.clearCache()
display all available annotators that can be saved as a Model
</tr></table>

... (truncated)

Changelog

Sourced from spark-nlp's changelog.

3.3.0

Major features and improvements

NEW: Beginning of Spark NLP 3.3.0 release there will be no limitation of size when you import TensorFlow models! You can now import TF Hub & HuggingFace models larger than 2G of size.

NEW: Up to 50x faster when saving Spark NLP models and pipelines! 🚀 We have improved the way we package TensorFlow SavedModel while saving Spark NLP models & pipelines. For instace, it used to take up to 10 minutes to save xlm_roberta_base model prior to Spark NLP 3.3.0, and now it only takes up to 15 seconds!

NEW: Introducing AlbertForTokenClassification annotator in Spark NLP 🚀. AlbertForTokenClassification can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using AlbertForTokenClassification or TFAlbertForTokenClassification in HuggingFace 🤗

NEW: Introducing XlnetForTokenClassification annotator in Spark NLP 🚀. XlnetForTokenClassification can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using XLNetForTokenClassificationet or TFXLNetForTokenClassificationet in HuggingFace 🤗

NEW: Introducing RoBertaForTokenClassification annotator in Spark NLP 🚀. RoBertaForTokenClassification can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using RobertaForTokenClassification or TFRobertaForTokenClassification in HuggingFace 🤗

NEW: Introducing XlmRoBertaForTokenClassification annotator in Spark NLP 🚀. XlmRoBertaForTokenClassification can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using XLMRobertaForTokenClassification or TFXLMRobertaForTokenClassification in HuggingFace 🤗

NEW: Introducing LongformerForTokenClassification annotator in Spark NLP 🚀. LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using LongformerForTokenClassification or TFLongformerForTokenClassification in HuggingFace 🤗

NEW: Introducing new ResourceDownloader functions to easily look for pretrained models & pipelines inside Spark NLP (Python and Scala). You can filter models or pipelines via language, version, or the name of the annotator

Welcoming Databricks Runtime 9.1 LTS, 9.1 ML, and 9.1 ML with GPU

Fix printing a wrong version return in sparknlp.version()

Bug Fixes

Fix a bug in RoBertaEmbeddings when all special tokens were identical

Fix a bug in RoBertaEmbeddings when special token contained valid regex

Fix a bug lead to memory leak inside NorvigSweeting spell checker. This issue caused issues with pretrained pipelines such as explain_document_ml and explain_document_dl when some inputs

Fix the wrong types being assigned to minCount and classCount in Python for ContextSpellCheckerApproach annotator

Fix explain_document_ml pretrained pipeline for Spark NLP 3.x on Apache Spark 2.x

======== 3.2.3

Bug Fixes & Enhancements

Add delimiter feature to CoNLL() class to support other delimiters in CoNLL files JohnSnowLabs/spark-nlp#5934

Add support for IOB in addition to IOB2 format in GraphExctraction JohnSnowLabs/spark-nlp#6101

Change YakeModel output type from KEYWORD to CHUNK to have more available features after the YakeModel annotator such as Chunk2Doc or ChunkEmbeddings JohnSnowLabs/spark-nlp#6065

Fix the default language for XlmRoBertaSentenceEmbeddings pretrained model in Python JohnSnowLabs/spark-nlp#6057

Fix SentenceEmbeddings issue concatenating sentences instead of each correspondent sentence JohnSnowLabs/spark-nlp#6060

Fix GraphExctraction usage in LightPipeline JohnSnowLabs/spark-nlp#6101

Fix compatibility issue in explain_document_ml pipeline

Better import process for corrupted merges file in Longformer tokenizer JohnSnowLabs/spark-nlp#6083

======== 3.2.2

New Features

... (truncated)

Commits

b1d43f0 Merge pull request #6193 from JohnSnowLabs/models_hub
17f9e61 Merge pull request #6128 from JohnSnowLabs/330-release-candidate
4b84c8d Bump Conda to 3.3.0 [skip test]
045e44b Update CHANGELOG [skip test]
385230c Add model 2021-09-29-sbiobertresolve_hcpcs_en
0e1e93c Update Scala and Python APIs
6cbfbd2 Bump version to 3.3.0 [run doc]
e127b0f Merge pull request #6180 from JohnSnowLabs/2021-09-28-classifierdl_bert_senti...
6f77964 Merge pull request #6191 from JohnSnowLabs/2021-09-29-xlm_roberta_base_finetu...
821ed5e Merge pull request #6190 from JohnSnowLabs/2021-09-29-xlm_roberta_base_finetu...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

nlpsandbox / phi-annotator-spark-nlp

Bump spark-nlp from 3.1.1 to 3.3.0 in /server #40

John Snow Labs Spark-NLP 3.3.0: New ALBERT, XLNet, RoBERTa, XLM-RoBERTa, and Longformer for Token Classification, 50x times faster to save models, new ways to discover pretrained models and pipelines, new state-of-the-art models, and lots more!

Overview

Major features and improvements

display and filter all available pretrained pipelines

display and filter all available pretrained pipelines

remove all the downloaded models & pipelines to free up storage

display all available annotators that can be saved as a Model

3.3.0

Major features and improvements

Bug Fixes

======== 3.2.3

Bug Fixes & Enhancements

======== 3.2.2