Closed BLKSerene closed 1 year ago
The title is a bit misleading, in the end, I understood that "Passing CRLF to LF" made the detector return something else. I took the time trying to reproduce your issue and could not. I have initially done the testing in 3.11 then by pure curiosity setup 3.8.10. Using Windows 11 and Ubuntu. Nothing seems wrong. Got every time UTF-16-BE.
If your reproducing script was not accurate and you re-verified, re-open this issue with complementary info.
@Ousret Sorry for the confusion, the text is missing some sentences. I've modified the code (the return value of open
should be exactly 3409
now).
I can't reopen this issue (or should I open a new one?), if you could re-verify this, please re-open it.
OK. The reproducer script now outputs what you encountered. I have narrowed it down to utils.cut_sequence_chunks
which did not cut chunks correctly.
See #233
Describe the bug The issue was found when testing
Charset Normalizer
on CI running different OSes.To Reproduce
Expected behavior Always return 'utf_16_be' on different OSes
Desktop (please complete the following information):