Closed klarkc closed 9 months ago
It is worthwhile to mention that •
from previous example also does not work, throwing the same error.
I think this is an encoding / system configuration error. If I copy the above to test.md
, I get:
jasper@taiyaki ~/P/patat (master)> file test.md
test.md: UTF-8 Unicode text
jasper@taiyaki ~/P/patat (master)> patat --dump test.md
(works)
jasper@taiyaki ~/P/patat (master)> echo $LANG
en_US.UTF-8
However, if I set LANG
to something else like C
, or unset it, I get:
jasper@taiyaki ~/P/patat (master)> LANG=C patat --dump test.md
patat: test.md: hGetContents: invalid argument (invalid byte sequence)
jasper@taiyaki ~/P/patat (master)> LANG= patat --dump test.md
patat: test.md: hGetContents: invalid argument (invalid byte sequence)
If you are using UTF-8 in files, you should update your system locale to support this (or call patat
with a compatible locale set).
That configuration error aside, in 2023 we can probably assume .md
files are encoded in UTF-8, so I can make that the default.
Hmm, weird my file is reporting this ASCII text
, and my $LANG
is en_US.utf-8
. I wonder what this means. I am using vim to create the files, It is set to utf-8, but still creating this ASCII text file. I even tried to use iconv to convert from ASCII to UTF-8, but there were no changes.
Actually if I copy the example characters it shows a different encode, I believe it's using the closest encode for the given chars, I tried both with nano and vim.
file reports: Unicode text, UTF-8 text
I also tried to change my LANG
to en_US.UTF-8
(uppercase), just in case, no changes.
This is all my available locales:
$ localectl list-locales
C.UTF-8
en_GB.UTF-8
en_US.UTF-8
pt_BR.UTF-8
pt_PT.UTF-8
My desktop locale differs from my terminal locale, because I prefer to use en_US on terminal apps:
$ localectl status
System Locale: LANG=pt_BR.UTF-8
VC Keymap: br-abnt2
X11 Layout: br
X11 Model: abnt2
$ echo $LANG
en_US.UTF-8
I added a fallback to UTF-8 if file decoding fails in the latest release, v0.9.1.0
. That should fix this issue, feel free to re-open if it doesn't.
When opening a markdown file with the
‘
character it throws:This character is pretty common, for example, it is used in GHC errors: