Closed xgdgsc closed 9 years ago
What "system default encoding" are you referring to?
That is the encoding that windows defaults to when you choose your locale.
What does sys.getdefaultencoding()
say?
‘utf-8'
Can you post the full filename, provided it doesn't leak personal information? i.e. print(repr(filename))
just before the line that fails? I am wondering if byte 553 in that is causing the problem, because that's the only thing that should be being implicitly decoded as gbk
.
The other thing to try for getting more information would be to take that execfile()
function and try it on its own. Rewrite it to break down each step independently on its own line; e.g. first f.read()
, then compile()
, then exec()
. The line that is throwing the exception is doing several things at once, and it's hard to know which one is actually throwing the error. Double-check which bytes
object actually has the 0xaa
byte.
Actually if I remove all Chinese characters in comment and keep only English characters in the python script, the error won' t occur. So the filename doesn' t matter. The content matters.
Okay. I really don't know why Python would try to decode the file content as gbk
unless if it declares itself so at the top.
According to Processing Text Files in Python 3 — Nick Coghlan's Python Notes 1.0 documentation , python would use the result of:
locale.getpreferredencoding()
to read file, which is 'cp936' in my case.
chardet seems to be able to recognize the file encoding correctly.
If you change open(filename)
to open(filename, 'rb')
, does that help?
I'm still not sure where the UnicodeDecodeError
error referring to gbk
would come from.
with open(filename, 'rb') as f:
works!
OK
According to the Python 3 Unicode document, you could specify the encoding while opening the file using the following line:
with open(filename, encoding='utf-8', mode = ‘r') as f:
for line in f:
print(repr(line))
In this way, each original character is still treated as a single character, and you do could operate part of the file without worrying about splitting the bytes of a single character.
GBK was not the latest , some special characters can't decide. You can try GB18030
the following method could fix it for me, it may work for you.
open(path, 'r', encoding='utf-8')
确定能用!
the following method could fix it for me, it may work for you.
open(path, 'r', encoding='utf-8')
确定能用!
能用
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='gb18030', errors="ignore") # here encoding='gbk' means the line is encoded in gbk style for line in input_stream:
We can try this: open("filename", encoding='ascii', errors='ignore') as f:
According to the Python 3 Unicode document, you could specify the encoding while opening the file using the following line:
with open(filename, encoding='utf-8', mode = ‘r') as f: for line in f: print(repr(line))
In this way, each original character is still treated as a single character, and you do could operate part of the file without worrying about splitting the bytes of a single character.
window上默认是gbk编码解析文件,通过指定确实就没问题了
with open(file_path, 'rb') as file
works for me, while with open(file_path, 'r', encoding='utf-8')
does't work. However using rb
add addtional characters b""
for my contents, so sad! Anyone has some good idea?! Thanks!
On windows, if the python script is encoded with utf-8 while the system default encoding is gbk, when running
kernprof -l
it would throw error like:I currently workaround this by converting the python file to gbk first, run kernprof. If I try to view the
FILE.lprof
file withpython -m line_profiler FILE.lprof
now, it would also give encoding error, and then I have to convert the python script back to utf-8 and runpython -m line_profiler FILE.lprof
to view the results. Is there a better way?