Closed wion closed 6 years ago
I wonder. #4 was a similar problem that was fixed using a different "locale". You can change the locale using an environment variable. You can set an environment variable for a single program call on the command line:
$ LC_CTYPE=UTF-8 python3 mastodon-backup-to-text.py wion@mastodon.social
Some background: "locale" is supposed to control how users want to see dates, characters, and the like. The default for your terminal could be C, which is the dumbest variant of them all. Obviously, it is sometimes the default. Check your locale settings! Here's mine:
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
Thus, if you can't print UTF-8 characters, you probably have LC_CTYPE
set to C
. When I set it to C
on my system, I get a similar error:
$ LC_CTYPE=C mastodon-archive text kensanata@octodon.social
Traceback (most recent call last):
File "/usr/local/bin/mastodon-archive", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/mastodon_archive/__init__.py", line 65, in main
args.command(args)
File "/usr/local/lib/python3.6/site-packages/mastodon_archive/text.py", line 70, in text
print("%s boosted" % status["account"]["display_name"])
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f41d' in position 15: ordinal not in range(128)
I will add this to the README.
New troubleshooting section with macOS specific information. Let me know if this helps.
Interesting.
My "International" settings look the same as yours in the Terminal profile...
Text encoding: Unicode (UTF-8)
But I still have this:
locale
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"
What does it look like in your Terminal > Preferences > Encoding tab? I have Unicde (UTF-8)
checked there. Should I turn off all others?
No, I have many selected. The ones that are selected here simply appear in "text encoding menus" whatever these are. The important part might be the checkbox below: Set locale environment variables on startup. You have it checked? When I uncheck it and open a new Terminal window:
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
(I got the idea from here, but perhaps there's more involved?)
The ones that are selected here simply appear in "text encoding menus"
Ah, right. I see that now.
The important part might be the checkbox below: Set locale environment variables on startup. You have it checked?
Yes.
But curiously, if I uncheck the box, and open a new Terminal window, I still have the same category settings:
locale
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"
Even if I change the order of my system language choices...
I still get the same Terminal values as above. Maybe I need to reboot the machine?
I'm not sure why my LC_ALL
value is set to "C", but I suspect that might be a problem here because it overrides all the others, according to man locale
docs. I can't figure out how to edit that to empty, LC_ALL=""
.
Crud. Not working. To summarize my status...
Sys Prefs > Language and Regions > Preferred languages: EN-US, FR-FR, EN
Terminal > Preferences > Profile > Advanced > International > Text encoding: Unicode (UTF-8) and "Set local environment variables" is checked.
On command-line...
Because I used export
command to create these:
env | grep '^LC_'
LC_ALL=en_US.utf8
LC_CTYPE=en_US.utf8
But still just outputs this:
locale
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"
Finally, this does not work, apparently:
LC_CTYPE=UTF-8 python3 mastodon-backup-to-text.py wion@mastodon.social
Traceback (most recent call last):
File "mastodon-backup-to-text.py", line 87, in <module>
status["created_at"]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
Neither if I use LC_CTYPE=en_US.utf8
instead.
It's a can of encod-a-lingo worms.
The only thing I haven't tried is put the following at the end of my _.bashprofile, which I think is was suggested somewhere I read:
LC_ALL=en_US.utf8
LC_CTYPE=en_US.utf8
And this seems to suggest a reboot is indeed needed on system language changes, so I'll explore that later.
A system reboot should not be necessary, as long as you are fiddling with environment variables. If you put it in your .bash_profile
(or your .bashrc
) I think simply opening a new Terminal window should do it. My intuition tells me that capitalization and spelling might be more important, though. You use LC_CTYPE=en_US.utf8
but what if you used LC_CTYPE=en_US.UTF-8
instead, in your experiments?
Too bad you already verified that LC_CTYPE=UTF-8 python3 mastodon-backup-to-text.py wion@mastodon.social
doesn't work for you. This was my only hope.
I don't think setting up the Mac System Language will make a difference.
I decided to get rid of string printing once and for all. 5c2c21f introduces a different solution which will always force UTF-8 output, no matter what the system says about your terminal. If you want to give it a try, you need to install version 0.0.3. Please be aware that the installation instructions changed, and the calling conventions changed! I think you'll need to do the following:
# delete all the mastodon-backup*.py files you previously installed
pip3 install mastodon-archive
# in the correct directory
mastodon-archive text wion@mastodon.social
That worked. Thanks! Nice changes.
Just to report it.