I've got two suggestions on how to solve this, though I don't know too much about encoding schemes.
First suggestion is to simply add some override options that allow us to specify the encoding --utf8 and --shiftJIS will do what would be expected.
Second suggestion would be to try decoding a portion of the file as UTF-8 or SHIFT-JIS (e.g. with https://godoc.org/golang.org/x/text/encoding ) and then see if that produces an error. I don't know much about SHIFT-JIS so I'm not sure whether this would be a good example.
I have a repo here with a minimum example that exhibits the problem.
The file is utf-8 and has some emojis in it. trying to search for
foobar
with:pt foobar example.txt
will not show a match.This is a minimum example that shows the problem, other files seem to have the incorrect encoding detected.
the bytes for the lines are interpretted in UTF-8 as:
and in Shift-JIS as:
I've got two suggestions on how to solve this, though I don't know too much about encoding schemes.
First suggestion is to simply add some override options that allow us to specify the encoding
--utf8
and--shiftJIS
will do what would be expected.Second suggestion would be to try decoding a portion of the file as UTF-8 or SHIFT-JIS (e.g. with https://godoc.org/golang.org/x/text/encoding ) and then see if that produces an error. I don't know much about SHIFT-JIS so I'm not sure whether this would be a good example.
Have you got any thoughts?