nlpie / biomedicus

BioMedICUS: A biomedical and clinical NLP engine.
https://nlpie.github.io/biomedicus/
Apache License 2.0
17 stars 7 forks source link

Bump stanza from 1.7.0 to 1.8.1 #305

Closed dependabot[bot] closed 8 months ago

dependabot[bot] commented 8 months ago

Bumps stanza from 1.7.0 to 1.8.1.

Release notes

Sourced from stanza's releases.

PEFT Integration (with bugfixes)

Integrating PEFT into several different annotators

We integrate PEFT into our training pipeline for several different models. This greatly reduces the size of models with finetuned transformers, letting us make the finetuned versions of those models the default_accurate model.

The biggest gains observed are with the constituency parser and the sentiment classifier.

Previously, the default_accurate package used transformers where the head was trained but the transformer itself was not finetuned.

Model improvements

  • POS trained with split optimizer for transformer & non-transformer - unfortunately, did not find settings which consistently improved results stanfordnlp/stanza#1320
  • Sentiment trained with peft on the transformer: noticeably improves results for each model. SST scores go from 68 F1 w/ charlm, to 70 F1 w/ transformer, to 74-75 F1 with finetuned or Peft finetuned transformer. stanfordnlp/stanza#1335
  • NER also trained with peft: unfortunately, no consistent improvements to scores stanfordnlp/stanza#1336
  • depparse includes peft: no consistent improvements yet stanfordnlp/stanza#1337 stanfordnlp/stanza#1344
  • Dynamic oracle for top-down constituent parser scheme. Noticeable improvement in the scores for the topdown parser stanfordnlp/stanza#1341
  • Constituency parser uses peft: this produces significant improvements, close to the full benefit of finetuning the entire transformer when training constituencies. Example improvement, 87.01 to 88.11 on ID_ICON dataset. stanfordnlp/stanza#1347
  • Scripts to build a silver dataset for the constituency parser with filtering of sentences based on model agreement among the sub-models for the ensembles used. Preliminary work indicates an improvement in the benefits of the silver trees, with more work needed to find the optimal parameters used to build the silver dataset. stanfordnlp/stanza#1348
  • Lemmatizer ignores goeswith words when training: eliminates words which are a single word, labeled with a single lemma, but split into two words in the UD training data. Typical example would be split email addresses in the EWT training set. stanfordnlp/stanza#1346 stanfordnlp/stanza#1345

Features

Bugfixes

Additional 1.8.1 Bugfixes

PEFT integration

Integrating PEFT into several different annotators

We integrate PEFT into our training pipeline for several different models. This greatly reduces the size of models with finetuned transformers, letting us make the finetuned versions of those models the default_accurate model.

The biggest gains observed are with the constituency parser and the sentiment classifier.

... (truncated)

Commits
  • c2d72bd Will update Stanza version to quickly fix a few bugs
  • 13ee3d5 Use a get() to avoid crashing if an older model with no bert_funetune set is ...
  • 6e2520f Add some debug logging when building a retag_pipeline - goal is to make sure ...
  • 44058a0 Allow for an individual pipeline to override which device it is placed on. F...
  • 17eb6fc Quieter logging when building a peft wrapper
  • e89a7d4 Fix TOP_DOWN parser for da_arboretum, which needs to look at the actual root ...
  • 5f18a61 Add the ability to process a few languages with prepare_resources.py
  • 363bbec Update sentmient to also have charlm & transformer versions. The transformer...
  • 4a7052b Minor logging / typo improvements to conparser
  • 3e21404 Initial attempt to chop up long inputs to a transformer into pieces that the ...
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)