pbeck / whatsbook

Create books from WhatsApp group chats with Python and LaTeX
MIT License
154 stars 21 forks source link

Can't encode character u'\xf6' #1

Open habi opened 8 years ago

habi commented 8 years ago

Whenever I run wa2latex.py (Ubuntu 16.04, Python 2.7.12 (Anaconda custom (64-bit))), I get the error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)

from line 169 of the script.

I suppose this could be because I'm using a german chat lot with lots of Umlauts and ö corresponds to an ö... Is there any way I can make a book from my chat log (except changing all Umlauts)?

habi commented 8 years ago

PS: If I do change all the Umlauts to their written out expression (ä > ae, ö > oe, etc.), then the script works :) But the text is then not really nicely readable in German...

pbeck commented 8 years ago

Hey David,

Thanks for reporting, I’ll look into it! I’ve used lots of umlauts aswell (chats in Swedish), but I don’t remember having issues with them.

Any chance you could try running wa2latex with Python 3? And could you upload a sample snippet that causes the error?

habi commented 8 years ago

Running the command below with Python 3.5.2 (Thanks to Anaconda)

python wa2latex.py _chat_with_umlauts.txt > whatsbook-folio.tex

gives me

Traceback (most recent call last):
  File "wa2latex.py", line 148, in <module>
    line = emojis.replace_emoji(line)
  File "wa2latex.py", line 85, in replace_emoji
    text = text.replace(emoji, "\\emoji{" + emoji.encode('unicode-escape').encode('utf-8') + "}")
AttributeError: 'bytes' object has no attribute 'encode'

That's why I tried with Python2 :) Might there be a problem with the encoding of the exported chat TXT file?

pbeck commented 8 years ago

What’s the encoding of your txt file?

habi commented 8 years ago

I’ve tried under Linux (at work), where I don’t have access to the file now. At home (on OS X 10.11.6), the chat.txt file is UTF-8 encoded, and I get this error with Python 2.7.11

anomalocaris:whatsbook habi$ python wa2latex.py _chat.txt > whatsbook-folio.tex
Traceback (most recent call last):
  File "wa2latex.py", line 26, in <module>
    import pandas as pd
  File "/usr/local/lib/python2.7/site-packages/pandas/__init__.py", line 44, in <module>
    from pandas.core.api import *
  File "/usr/local/lib/python2.7/site-packages/pandas/core/api.py", line 9, in <module>
    from pandas.core.groupby import Grouper
  File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 17, in <module>
    from pandas.core.frame import DataFrame
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in <module>
    from pandas.core.series import Series
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 2909, in <module>
    import pandas.tools.plotting as _gfx
  File "/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py", line 28, in <module>
    import pandas.tseries.converter as conv
  File "/usr/local/lib/python2.7/site-packages/pandas/tseries/converter.py", line 7, in <module>
    import matplotlib.units as units
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1131, in <module>
    rcParams = rc_params()
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 975, in rc_params
    return rc_params_from_file(fname, fail_on_error)
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1100, in rc_params_from_file
    config_from_file = _rc_params_in_file(fname, fail_on_error)
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1018, in _rc_params_in_file
    with _open_file_or_url(fname) as fd:
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1000, in _open_file_or_url
    encoding = locale.getdefaultlocale()[1]
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 543, in getdefaultlocale
    return _parse_localename(localename)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 475, in _parse_localename
    raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8
pbeck commented 8 years ago

What happens if you do

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

or perhaps even better

export LC_ALL=de_CH.UTF-8
export LANG=de_CH.UTF-8

(if you’re in German speaking Switzerland)

and then run wa2python.py with python2?

habi commented 8 years ago

If I export the Swiss german variables on OS X, then I get the same error as on Linux

Traceback (most recent call last):
  File "wa2latex.py", line 168, in <module>
    print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)
pbeck commented 8 years ago

Running wa2latex.py on a file containing åäöÅÄÖ works with Python 2.7.10 on macOS 10.11.5. My locales are set to en_US.UTF8.

Any chance you could send me a sample of your chatlog? It’s hard for me to debug without proper (non-working 😄) data.

habi commented 8 years ago

I just sent the file to the email address in your GitHub profile.

pbeck commented 8 years ago

I tried running wa2latex with your chat log, and it worked without any issues on macOS with Python 2.7. I’ll be traveling abroad next week, but I can hopefully figure something out when I’m back.

laserjay commented 7 years ago

Guys, I have the same problem on Ubuntu 16.04 with Python 2.7.12, LANG and LC_ALL both set to en_US.UTF-8 and the chat history being in UTF-8 (text contains Swiss German characters as well). I get the following error:

Traceback (most recent call last): File "wa2latex.py", line 168, in print(u"\section*{%s}" % date) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)

Also, the section markers create sometimes correct ones: \section{01.01.2000} but other times they use the first word on a new line if there's no date present: \section{word}

I checked the file with a hex editor and found out, that if there's a 0x200A before the new line/word, it gets a section marker with words, before other new lines there is a 0x0D0A and it parses the date correctly.

pbeck commented 7 years ago

I haven’t been able to reproduce @habi’s issues, but I’m sure they’re valid – even more so if you also have issues @laserjay. My ambition is to rewrite parts of wa2latex for Python 3 as soon as possible, I’m hoping this will if not solve your issues, at least make them easier to debug.

laserjay commented 7 years ago

Hey @pbeck

Similiar to @habi, I was able to make it work by

However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that.

If you're rewriting it anyway, could you also add a function to optionally include the timestamp as well?

Thank you very much, really looking forward to it and let me know if I can provide further help, i.e. by testing it! :)

Cheers! laserjay

bakshi-varun commented 7 years ago

hi

I am facing a similar issue as lasrerjay "However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that." Any leads how to solve that?

pbeck commented 7 years ago

@bakshi-varun Maybe laserjays latest comment might help? I haven’t had the time to update the script yet and no ETA for when that will happen unfortunately.