ownaginatious / fbchat-archive-parser

An application for parsing chat history from a Facebook data archive.
MIT License
312 stars 38 forks source link

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 7 #2

Closed tels7ar closed 8 years ago

tels7ar commented 8 years ago

When I attempt to extract just one conversation from my messages archive, it dies with:

$ fbcap -t 'Elizabeth' messages.htm > us.txt
Traceback (most recent call last):
  File "/usr/local/bin/fbcap", line 9, in <module>
    load_entry_point('fbchat-archive-parser==0.4.post2', 'console_scripts', 'fbcap')()
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/main.py", line 66, in main
  File "/Library/Python/2.7/site-packages/clip.py", line 652, in run
    self.invoke(self.parse(tokens))
  File "/Library/Python/2.7/site-packages/clip.py", line 634, in invoke
    self._main.invoke(parsed)
  File "/Library/Python/2.7/site-packages/clip.py", line 519, in invoke
    self._callback(**{k: v for k, v in iteritems(parsed) if k not in self._subcommands})
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/main.py", line 31, in fbcap
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/writers/__init__.py", line 22, in write
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/writers/writer.py", line 14, in write
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/writers/text.py", line 29, in write_history
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/writers/text.py", line 42, in write_thread
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/writers/text.py", line 53, in write_message
  File "/Library/Python/2.7/site-packages/colorama/ansitowin32.py", line 36, in write
    self.__convertor.write(text)
  File "/Library/Python/2.7/site-packages/colorama/ansitowin32.py", line 137, in write
    self.write_and_convert(text)
  File "/Library/Python/2.7/site-packages/colorama/ansitowin32.py", line 165, in write_and_convert
    self.write_plain_text(text, cursor, len(text))
  File "/Library/Python/2.7/site-packages/colorama/ansitowin32.py", line 170, in write_plain_text
    self.wrapped.write(text[start:end])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 7: ordinal not in range(128)

I'm trying to convert the file to ascii with uni2ascii now to see if I can get around this.

tels7ar commented 8 years ago

Running the file through uni2ascii first (http://billposer.org/Software/uni2ascii.html) removed that unicode character and allowed me to extract the entire 139,000 lines of chat between me and another person. Thank you so much for writing this tool, it was exactly what I needed.

ownaginatious commented 8 years ago

No problem :) In regard to the unicode error you're having, that's Python 2 specific and shouldn't occur under Python 3. Regardless, I added a fix for that since Python 2 installations are still quite prolific. Thanks for pointing it out!

tels7ar commented 8 years ago

Sorry doesn't work for me:

Traceback (most recent call last):
  File "/usr/local/bin/fbcap", line 9, in <module>
    load_entry_point('fbchat-archive-parser==0.4.post3', 'console_scripts', 'fbcap')()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 357, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 2394, in load_entry_point
    return ep.load()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 2108, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/main.py", line 9, in <module>
NameError: name 'codecs' is not defined
ownaginatious commented 8 years ago

Sorry, forgot an import. Should be good now.

tels7ar commented 8 years ago

New error:

Traceback (most recent call last): File "/usr/local/bin/fbcap", line 9, in load_entry_point('fbchat-archive-parser==0.5.post1', 'console_scripts', 'fbcap')() File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/l- ib/python/pkg_resources.py", line 357, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/l- ib/python/pkg_resources.py", line 2394, in load_entry_point return ep.load() File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/l- ib/python/pkg_resources.py", line 2108, in load entry = import(self.module_name, globals(),globals(), ['name']) File "build/bdist.macosx-10.10-intel/egg/fbchat_archive_parser/main.py", line 9, in AttributeError: 'file' object has no attribute 'detach'

On Fri, May 13, 2016, at 03:28 PM, Dillon Dixon wrote:

Sorry, forgot an import. Should be good now. — You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub[1]

Links:

  1. https://github.com/ownaginatious/fbchat-archive-parser/issues/2#issuecomment-219174428
ownaginatious commented 8 years ago

sigh encoding issues in Python 2... okay, I think it should be fix for real this time, haha.

tels7ar commented 8 years ago

Confirmed fixed. Thank you.