Closed Tseing closed 1 year ago
Hi Leo. Thank you for reporting the error. Could you perhaps start by submitting a PR that contains a failing test for this problem?
@lioman: What do you think about the error and the solution proposed here?
Sounds like the correct solution. We should definitely add a test for that.
Hi @justinmayer, I am glad to help fix this bug. But it seems a little sticky to detect file encoding in unit test.
@Tseing: You tried with chardet
and it did not work as you expected?
@justinmayer Yes, I found out the problem. How Python encodes file object without encoding="utf-8"
depends on language and OS environment. In lots of charsets, they shares some same encoding of basic characters. So chardet cannot detect right encoding of a file which contains only numbers, Latin letter and some basic characters like these. I manually made a UTF-8 testing string and chardet worked well.
File encoding testing needs to create a temp file and it will be deleted soon. Is that OK?
@Tseing: Yes, I think it's okay to temporarily create and delete a file in the context of a unit test.
Issue
Hi,
I got an unicode error when I was trying pelican-search 1.1.0.
I checked
output\search.toml
and it is GB18030 encoding. Obviouly, pelican-search plugin generatedoutput\search.toml
with default system encoding. I speculate this error will be raised in Windows and non-English environment. I checked the code, line 97:The
open
method should be modified asAnd this error is solved. It is a simple fix. Should I create a PR?