kensanata / mastodon-archive

Archive your statuses, favorites and media using the Mastodon API (i.e. login required)
https://alexschroeder.ch/software/Mastodon_Archive
GNU General Public License v3.0
358 stars 33 forks source link

UnicodeEncodeError when generating text or HTML output #20

Closed wion closed 6 years ago

wion commented 6 years ago

Just to report it.

$ python3 mastodon-backup-to-text.py wion@mastodon.social
Traceback (most recent call last):
  File "mastodon-backup-to-text.py", line 87, in <module>
    status["created_at"]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
kensanata commented 6 years ago

I wonder. #4 was a similar problem that was fixed using a different "locale". You can change the locale using an environment variable. You can set an environment variable for a single program call on the command line:

$ LC_CTYPE=UTF-8 python3 mastodon-backup-to-text.py wion@mastodon.social

Some background: "locale" is supposed to control how users want to see dates, characters, and the like. The default for your terminal could be C, which is the dumbest variant of them all. Obviously, it is sometimes the default. Check your locale settings! Here's mine:

$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Thus, if you can't print UTF-8 characters, you probably have LC_CTYPE set to C. When I set it to C on my system, I get a similar error:

$ LC_CTYPE=C mastodon-archive text kensanata@octodon.social
Traceback (most recent call last):
  File "/usr/local/bin/mastodon-archive", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/mastodon_archive/__init__.py", line 65, in main
    args.command(args)
  File "/usr/local/lib/python3.6/site-packages/mastodon_archive/text.py", line 70, in text
    print("%s boosted" % status["account"]["display_name"])
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f41d' in position 15: ordinal not in range(128)

I will add this to the README.

kensanata commented 6 years ago

New troubleshooting section with macOS specific information. Let me know if this helps.

wion commented 6 years ago

Interesting.

My "International" settings look the same as yours in the Terminal profile...

Text encoding: Unicode (UTF-8)

But I still have this:

locale
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

What does it look like in your Terminal > Preferences > Encoding tab? I have Unicde (UTF-8) checked there. Should I turn off all others?

kensanata commented 6 years ago

No, I have many selected. The ones that are selected here simply appear in "text encoding menus" whatever these are. The important part might be the checkbox below: Set locale environment variables on startup. You have it checked? When I uncheck it and open a new Terminal window:

$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

(I got the idea from here, but perhaps there's more involved?)

wion commented 6 years ago

The ones that are selected here simply appear in "text encoding menus"

Ah, right. I see that now.

The important part might be the checkbox below: Set locale environment variables on startup. You have it checked?

Yes.

But curiously, if I uncheck the box, and open a new Terminal window, I still have the same category settings:

locale
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

Even if I change the order of my system language choices...

I still get the same Terminal values as above. Maybe I need to reboot the machine?

I'm not sure why my LC_ALL value is set to "C", but I suspect that might be a problem here because it overrides all the others, according to man locale docs. I can't figure out how to edit that to empty, LC_ALL="".

wion commented 6 years ago

This might be answering my question.

wion commented 6 years ago

Crud. Not working. To summarize my status...

Sys Prefs > Language and Regions > Preferred languages: EN-US, FR-FR, EN

Terminal > Preferences > Profile > Advanced > International > Text encoding: Unicode (UTF-8) and "Set local environment variables" is checked.

On command-line...

Because I used export command to create these:

env | grep '^LC_'
LC_ALL=en_US.utf8
LC_CTYPE=en_US.utf8

But still just outputs this:

locale
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

Finally, this does not work, apparently:

LC_CTYPE=UTF-8 python3 mastodon-backup-to-text.py wion@mastodon.social
Traceback (most recent call last):
  File "mastodon-backup-to-text.py", line 87, in <module>
    status["created_at"]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)

Neither if I use LC_CTYPE=en_US.utf8 instead.

It's a can of encod-a-lingo worms.

The only thing I haven't tried is put the following at the end of my _.bashprofile, which I think is was suggested somewhere I read:

LC_ALL=en_US.utf8
LC_CTYPE=en_US.utf8

And this seems to suggest a reboot is indeed needed on system language changes, so I'll explore that later.

kensanata commented 6 years ago

A system reboot should not be necessary, as long as you are fiddling with environment variables. If you put it in your .bash_profile (or your .bashrc) I think simply opening a new Terminal window should do it. My intuition tells me that capitalization and spelling might be more important, though. You use LC_CTYPE=en_US.utf8 but what if you used LC_CTYPE=en_US.UTF-8 instead, in your experiments?

Too bad you already verified that LC_CTYPE=UTF-8 python3 mastodon-backup-to-text.py wion@mastodon.social doesn't work for you. This was my only hope.

I don't think setting up the Mac System Language will make a difference.

kensanata commented 6 years ago

I decided to get rid of string printing once and for all. 5c2c21f introduces a different solution which will always force UTF-8 output, no matter what the system says about your terminal. If you want to give it a try, you need to install version 0.0.3. Please be aware that the installation instructions changed, and the calling conventions changed! I think you'll need to do the following:

# delete all the mastodon-backup*.py files you previously installed
pip3 install mastodon-archive
# in the correct directory
mastodon-archive text wion@mastodon.social
wion commented 6 years ago

That worked. Thanks! Nice changes.