Open habi opened 8 years ago
PS: If I do change all the Umlauts to their written out expression (ä > ae, ö > oe, etc.), then the script works :) But the text is then not really nicely readable in German...
Hey David,
Thanks for reporting, I’ll look into it! I’ve used lots of umlauts aswell (chats in Swedish), but I don’t remember having issues with them.
Any chance you could try running wa2latex with Python 3? And could you upload a sample snippet that causes the error?
Running the command below with Python 3.5.2
(Thanks to Anaconda)
python wa2latex.py _chat_with_umlauts.txt > whatsbook-folio.tex
gives me
Traceback (most recent call last):
File "wa2latex.py", line 148, in <module>
line = emojis.replace_emoji(line)
File "wa2latex.py", line 85, in replace_emoji
text = text.replace(emoji, "\\emoji{" + emoji.encode('unicode-escape').encode('utf-8') + "}")
AttributeError: 'bytes' object has no attribute 'encode'
That's why I tried with Python2 :) Might there be a problem with the encoding of the exported chat TXT file?
What’s the encoding of your txt file?
I’ve tried under Linux (at work), where I don’t have access to the file now.
At home (on OS X 10.11.6), the chat.txt
file is UTF-8 encoded, and I get this error with Python 2.7.11
anomalocaris:whatsbook habi$ python wa2latex.py _chat.txt > whatsbook-folio.tex
Traceback (most recent call last):
File "wa2latex.py", line 26, in <module>
import pandas as pd
File "/usr/local/lib/python2.7/site-packages/pandas/__init__.py", line 44, in <module>
from pandas.core.api import *
File "/usr/local/lib/python2.7/site-packages/pandas/core/api.py", line 9, in <module>
from pandas.core.groupby import Grouper
File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 17, in <module>
from pandas.core.frame import DataFrame
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in <module>
from pandas.core.series import Series
File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 2909, in <module>
import pandas.tools.plotting as _gfx
File "/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py", line 28, in <module>
import pandas.tseries.converter as conv
File "/usr/local/lib/python2.7/site-packages/pandas/tseries/converter.py", line 7, in <module>
import matplotlib.units as units
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1131, in <module>
rcParams = rc_params()
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 975, in rc_params
return rc_params_from_file(fname, fail_on_error)
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1100, in rc_params_from_file
config_from_file = _rc_params_in_file(fname, fail_on_error)
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1018, in _rc_params_in_file
with _open_file_or_url(fname) as fd:
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1000, in _open_file_or_url
encoding = locale.getdefaultlocale()[1]
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 543, in getdefaultlocale
return _parse_localename(localename)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 475, in _parse_localename
raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8
What happens if you do
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
or perhaps even better
export LC_ALL=de_CH.UTF-8
export LANG=de_CH.UTF-8
(if you’re in German speaking Switzerland)
and then run wa2python.py with python2?
If I export the Swiss german variables on OS X, then I get the same error as on Linux
Traceback (most recent call last):
File "wa2latex.py", line 168, in <module>
print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)
Running wa2latex.py on a file containing åäöÅÄÖ works with Python 2.7.10 on macOS 10.11.5. My locales are set to en_US.UTF8.
Any chance you could send me a sample of your chatlog? It’s hard for me to debug without proper (non-working 😄) data.
I just sent the file to the email address in your GitHub profile.
I tried running wa2latex with your chat log, and it worked without any issues on macOS with Python 2.7. I’ll be traveling abroad next week, but I can hopefully figure something out when I’m back.
Guys, I have the same problem on Ubuntu 16.04 with Python 2.7.12, LANG and LC_ALL both set to en_US.UTF-8 and the chat history being in UTF-8 (text contains Swiss German characters as well). I get the following error:
Traceback (most recent call last):
File "wa2latex.py", line 168, in
Also, the section markers create sometimes correct ones: \section{01.01.2000} but other times they use the first word on a new line if there's no date present: \section{word}
I checked the file with a hex editor and found out, that if there's a 0x200A before the new line/word, it gets a section marker with words, before other new lines there is a 0x0D0A and it parses the date correctly.
I haven’t been able to reproduce @habi’s issues, but I’m sure they’re valid – even more so if you also have issues @laserjay. My ambition is to rewrite parts of wa2latex for Python 3 as soon as possible, I’m hoping this will if not solve your issues, at least make them easier to debug.
Hey @pbeck
Similiar to @habi, I was able to make it work by
However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that.
If you're rewriting it anyway, could you also add a function to optionally include the timestamp as well?
Thank you very much, really looking forward to it and let me know if I can provide further help, i.e. by testing it! :)
Cheers! laserjay
hi
I am facing a similar issue as lasrerjay "However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that." Any leads how to solve that?
@bakshi-varun Maybe laserjays latest comment might help? I haven’t had the time to update the script yet and no ETA for when that will happen unfortunately.
Whenever I run
wa2latex.py
(Ubuntu 16.04, Python 2.7.12 (Anaconda custom (64-bit))), I get the errorfrom line 169 of the script.
I suppose this could be because I'm using a german chat lot with lots of Umlauts and
ö
corresponds to an ö... Is there any way I can make a book from my chat log (except changing all Umlauts)?