Ousret/charset_normalizer
### [`v3.0.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#300-httpsgithubcomOusretcharsetnormalizercompare211300-2022-10-20)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0)
##### Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
##### Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
##### Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- Sphinx warnings when generating the documentation
##### Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- Breaking: Method `first()` and `best()` from CharsetMatch
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function `normalize`
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
- Support for the backport `unicodedata2`
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
[ ] If you want to rebase/retry this PR, click this checkbox.
This PR has been generated by Mend Renovate. View repository job log here.
This PR contains the following updates:
==2.1.1
->==3.0.0
Release Notes
Ousret/charset_normalizer
### [`v3.0.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#300-httpsgithubcomOusretcharsetnormalizercompare211300-2022-10-20) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) ##### Added - Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results - Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES - Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio - `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl) ##### Changed - Build with static metadata using 'build' frontend - Make the language detection stricter - Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1 ##### Fixed - CLI with opt --normalize fail when using full path for files - TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it - Sphinx warnings when generating the documentation ##### Removed - Coherence detector no longer return 'Simple English' instead return 'English' - Coherence detector no longer return 'Classical Chinese' instead return 'Chinese' - Breaking: Method `first()` and `best()` from CharsetMatch - UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII) - Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches - Breaking: Top-level function `normalize` - Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch - Support for the backport `unicodedata2`Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.