Closed pombredanne closed 2 months ago
I reckon that on the surface this issue seems to be related to https://github.com/jawah/charset_normalizer/issues/391 ... but IMHO this is still a bug, as a single character or a small minority of characters should not dictate the whole encoding of the larger string that contains them.
And I would NOT expect that the behavior would change so drastically without a version bump, as this package is a dependency on pip, requests and other popular packages. The behavior of of 3.3.0 or 3.2.0 is OK, 3.3.1 should become a 4.0.0 if you do not consider these changes as a regression.
(I am assuming may be incorrectly that you use some ki9nd of semver'ish versioning scheme)
but IMHO this is still a bug, as a single character or a small minority of characters should not dictate the whole encoding of the larger string that contains them.
Indeed, this behavior is not ideal across minor.
And I would NOT expect that the behavior would change so drastically without a version bump...IMHO an API breaking major version bump
Don't forget that we handle an heuristic algorithm and covering all the cases hosted on all other project can be next to impossible.
ki9nd of semver'ish versioning scheme)
We follow semver as best as we can.
Nevertheless, we fixed the presented case, and it will be available in the next minor.
@Ousret Thanks! :heart:
Describe the bug The detection of encoding did change recently, and IMHO regressed (I found that in a CI failure https://dev.azure.com/nexB/commoncode/_build/results?buildId=14502&view=logs&jobId=ba20146e-138e-5341-c558-bc25972fe2bd&j=ba20146e-138e-5341-c558-bc25972fe2bd&t=18eddfd8-abe5-5f8c-405c-5d0e0bd4c25d ) where we use beautifulsoup4 that uses in turn charset_normalizer.
To Reproduce Note that I am using bs4 UnicodeDammit to show the side effects. I added the encoding detection that to see the charset_normalizer side:
Up to 3.2.0 the behavior is stable:
Note the small change in 3.3.0
Note the big change in 3.3.1
Expected behavior
I would expect the behavior of 3.2.0 or 3.3.0 as correct. The 3.3.1 is not correct or if it is, then this should be IMHO an API breaking major version bump
Desktop (please complete the following information):