marco-c / crashsimilarity

Similarity between crash reports
Mozilla Public License 2.0
9 stars 12 forks source link

Bump gensim from 3.8.3 to 4.0.0 #261

Closed dependabot[bot] closed 3 years ago

dependabot[bot] commented 3 years ago

Bumps gensim from 3.8.3 to 4.0.0.

Release notes

Sourced from gensim's releases.

4.0.0

Changes

4.0.0, 2021-03-24

⚠️ Gensim 4.0 contains breaking API changes! See the Migration guide to update your existing Gensim 3.x code and models.

Gensim 4.0 is a major release with lots of performance & robustness improvements, and a new website.

Main highlights

  • Massively optimized popular algorithms the community has grown to love: fastText, word2vec, doc2vec, phrases:

    a. Efficiency

    model 3.8.3: wall time / peak RAM / throughput 4.0.0: wall time / peak RAM / throughput
    fastText 2.9h / 4.11 GB / 822k words/s 2.3h / 1.26 GB / 914k words/s
    word2vec 1.7h / 0.36 GB / 1685k words/s 1.2h / 0.33 GB / 1762k words/s

    In other words, fastText now needs 3x less RAM (and is faster); word2vec has 2x faster init (and needs less RAM, and is faster); detecting collocation phrases is 2x faster. (4.0 benchmarks)

    b. Robustness. We fixed a bunch of long-standing bugs by refactoring the internal code structure (see 🔴 Bug fixes below)

    c. Simplified OOP model for easier model exports and integration with TensorFlow, PyTorch &co.

    These improvements come to you transparently aka "for free", but see Migration guide for some changes that break the old Gensim 3.x API. Update your code accordingly.

  • Dropped a bunch of externally contributed modules and wrappers: summarization, pivoted TFIDF, Mallet…

    • Code quality was not up to our standards. Also there was no one to maintain these modules, answer user questions, support them.

      So rather than let them rot, we took the hard decision of removing these contributed modules from Gensim. If anyone's interested in maintaining them, please fork & publish into your own repo. They can live happily outside of Gensim.

  • Dropped Python 2. Gensim 4.0 is Py3.6+. Read our Python version support policy.

    • If you still need Python 2 for some reason, stay at Gensim 3.8.3.
  • A new Gensim website – finally! 🙃

So, a major clean-up release overall. We're happy with this tighter, leaner and faster Gensim.

This is the direction we'll keep going forward: less kitchen-sink of "latest academic algorithms", more focus on robust engineering, targetting concrete NLP & document similarity use-cases.

:+1: New features

... (truncated)

Changelog

Sourced from gensim's changelog.

4.0.0, 2021-03-24

⚠️ Gensim 4.0 contains breaking API changes! See the Migration guide to update your existing Gensim 3.x code and models.

Gensim 4.0 is a major release with lots of performance & robustness improvements, and a new website.

Main highlights

  • Massively optimized popular algorithms the community has grown to love: fastText, word2vec, doc2vec, phrases:

    a. Efficiency

    model 3.8.3: wall time / peak RAM / throughput 4.0.0: wall time / peak RAM / throughput
    fastText 2.9h / 4.11 GB / 822k words/s 2.3h / 1.26 GB / 914k words/s
    word2vec 1.7h / 0.36 GB / 1685k words/s 1.2h / 0.33 GB / 1762k words/s

    In other words, fastText now needs 3x less RAM (and is faster); word2vec has 2x faster init (and needs less RAM, and is faster); detecting collocation phrases is 2x faster. (4.0 benchmarks)

    b. Robustness. We fixed a bunch of long-standing bugs by refactoring the internal code structure (see 🔴 Bug fixes below)

    c. Simplified OOP model for easier model exports and integration with TensorFlow, PyTorch &co.

    These improvements come to you transparently aka "for free", but see Migration guide for some changes that break the old Gensim 3.x API. Update your code accordingly.

  • Dropped a bunch of externally contributed modules and wrappers: summarization, pivoted TFIDF, Mallet…

    • Code quality was not up to our standards. Also there was no one to maintain these modules, answer user questions, support them.

      So rather than let them rot, we took the hard decision of removing these contributed modules from Gensim. If anyone's interested in maintaining them, please fork & publish into your own repo. They can live happily outside of Gensim.

  • Dropped Python 2. Gensim 4.0 is Py3.6+. Read our Python version support policy.

    • If you still need Python 2 for some reason, stay at Gensim 3.8.3.
  • A new Gensim website – finally! 🙃

So, a major clean-up release overall. We're happy with this tighter, leaner and faster Gensim.

This is the direction we'll keep going forward: less kitchen-sink of "latest academic algorithms", more focus on robust engineering, targetting concrete NLP & document similarity use-cases.

:+1: New features

... (truncated)

Commits
  • f46d72a Merge branch 'release-4.0.0'
  • bae3359 bumped version to 4.0.0
  • f9914a6 Changelog between 3.8.3 and 4.0.0 (#3088)
  • 5b37014 Update CHANGELOG.md
  • 6851524 Read NIPS data on the fly (#3082)
  • 04f3414 Fix some of the warnings/deprecated functions (#3080)
  • 83b8821 bump version to proper 4.0.0 after RC1 release
  • 54c1aea Merge pull request #3084 from RaRe-Technologies/fix_downloads
  • 61bb2b1 Merge remote-tracking branch 'origin/develop' into fix_downloads
  • 3ea0963 fix code style
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 3 years ago

Superseded by #265.