kensanata / mastodon-archive

Archive your statuses, favorites and media using the Mastodon API (i.e. login required)
https://alexschroeder.ch/software/Mastodon_Archive
GNU General Public License v3.0
363 stars 33 forks source link

`mastodon-backup-to-html.py` dies with `UnicodeError` #8

Closed emanchado closed 6 years ago

emanchado commented 6 years ago

When I try to turn the backup (which works fine) to HTML, mastodon-backup-to-html.py dies with UnicodeError :-( I have three accounts, and all of them have the same problem. Full error:

Traceback (most recent call last):
  File "./mastodon-backup-to-html.py", line 294, in <module>
    print(html)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2663' in position 561: ordinal not in range(128)

I have the impression that my main account (in mastodon.social) chokes right away because of my second surname ("Velázquez"). The others seem to choke on concrete toots.

I'm happy to provide the backup JSON file if it helps (it doesn't have any private information, I take it? I haven't sent anything private at least in the most recent account, at least.

kensanata commented 6 years ago

Yeah, this sounds plausible. I’ll give this a try next week.

kensanata commented 6 years ago

I created an account with then name "Alex Schroeder ⚠" and the text "Testing äöü" and was able to make a backup and export it to HTML without an error. Thus, I need to know more. I'm using Python 3.6.3, I'm on macOS, $LANG is not set, locale says:

alex@Megabombus:~/src/mastodon-backup (master %=)$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Any other info you can provide?

kensanata commented 6 years ago

I'm a bit concerned that you're not using the latest version. I have print(html) on line 276 of mastodon-backup-to-html.py.

emanchado commented 6 years ago

It was an issue with locales. I wonder why I never had any problems with it before. Maybe Python3 is more strict or something? :-S

In any case, thanks. PEBKAC.

BTW, I had the latest version. Pulling now brought several commits, all of them <10h old. About the location of print(html), hm, it's still on line 294. I have no idea what's going on. I only see one branch in the repo, I'm up-to-date, and I don't have local changes :-/

kensanata commented 6 years ago

It was an issue with locales. I wonder why I never had any problems with it before.

I wonder. Care to describe the problem? Perhaps I can write some sort of trouble shooting guide.

emanchado commented 6 years ago

Ah, sorry, forgot to mention it for reference. The output of locale used to be:

LANG=
LANGUAGE=
LC_CTYPE=POSIX
LC_NUMERIC=nb_NO.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE="POSIX"
LC_MONETARY=nb_NO.UTF-8
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT=nb_NO.UTF-8
LC_IDENTIFICATION="POSIX"
LC_ALL=

The only change to make it work was setting LC_TYPE to (in this case) nb_NO.UTF-8. Thanks for the tip, and sorry again for the noise.