Closed polyvertex closed 7 years ago
Good idea, but chardet
does not support CP 437, which is the encoding used when I call wmic
with subprocess. So sadly this doesn't work.
Well, there are numerous other encoding that chardet
doesn't support so I was seeing it not as a replacement but as a helper to be put next to your hard-coded encoding testing loop. Just a tip anyway.
Problem is, that I cant really detect, if the output was decoded correctly. I can just detect if the decoded bytes where valid in i certain codepage (no exception was thrown), but that doesn't necessarily mean it is the right codepage. I'm really looking forward to beta that pywin32 in keypirinha, to avoid these encoding problems :)
That's the point of it! chardet
's statistical analysis might help to filter-out (instead of filtering-in) encodings that are supported by chardet
using the confidence
value so I thought it might worth a try. I haven't tested it myself, but one would expect that a low confidence
value probably indicates a more "exotic" encoding.
I really do not like pywin32
's non-standard packaging (plus it's quite heavy), I remember now why I tried to avoid it :)
(forgot to close this)
It occurred to me that you could use the
chardet
module to auto-detect the encoding of the output you get from your sub-processes. It's not ideal since you first have to record the whole output (which is what you do anyway) instead of being able to parse it on-the-fly, but your plugin would feel way more stable and bullet proof; plus thechardet.detect()
function is straightforward.One recommendation if I may: give the function the whole output, do not try to optimize the speed of the call by limiting the amount of bytes to analyze or your may end up with an incorrect result.
chardet
will always be redistributed with KP so you can rely on its presence.