ownaginatious / fbchat-archive-parser

An application for parsing chat history from a Facebook data archive.
MIT License
312 stars 38 forks source link

fbchat_archive_parser.parser.FacebookDataError #53

Closed ckshitij closed 7 years ago

ckshitij commented 7 years ago

Getting error at a time of parsing the message.htm file Related files are Attached parse_file.zip

fbcap ./messages.htm > fbMessages.txt

Traceback (most recent call last):

File "/Users/coddict/anaconda/bin/fbcap", line 11, in <module>
    load_entry_point('fbchat-archive-parser==1.0.post1', 'console_scripts', 'fbcap')()
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/fbchat_archive_parser/main.py", line 118, in fbcap
    fbch = parser.parse()
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/fbchat_archive_parser/parser.py", line 92, in parse
    self._parse_content()
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/fbchat_archive_parser/parser.py", line 117, in _parse_content
    self._process_element(pos, element)
  File "/Users/coddict/anaconda/lib/python3.6/site-packages/fbchat_archive_parser/parser.py", line 250, in _process_element

"An unrecoverable parsing error has occurred (missing timestamp data)" fbchat_archive_parser.parser.FacebookDataError: An unrecoverable parsing error has occurred (missing timestamp data)

ownaginatious commented 7 years ago

Hi there! Unfortunately, the data you sent me in the zip file isn't very helpful; it's just a subset of the source code of this project.

As to your issue, it appears from the exception that there is at least one message in your messages.htm file that's missing timestamp data. Unfortunately, I cannot really diagnose what's wrong with it without looking in the file myself. Obviously that's private and you shouldn't post it ;)

To help diagnose the issue, you could add the following print statement after this line:

        ...
        elif tag == "p" and pos == "end":
            print (self.current_timestamp, self.current_sender, e.text.strip() if e.text else "")
            ...

That will at least tell you how many messages the parser gets through before crashing.

arnaudsm commented 7 years ago

Same problem here ! It seems that Facebook changed the archive structure recently.
Now the messages.htm file is only a few KB, and all the threads are in separate .html files in a /messages/ folder ! The whole parser is broken now..

ownaginatious commented 7 years ago

Thanks for letting me know. I'll take a look soon.

ownaginatious commented 7 years ago

@arnaudsm @ckshitij okay, should be fixed now. Please try the latest version: 1.1

ckshitij commented 7 years ago

@ownaginatious Thank you so much, it's working now. :)