Closed jtrueblood88 closed 8 years ago
Could you post the entire stack trace?
Just realized I should. See the edit above.
Hmm, it would appear that the error is coming from the XML parsing library that's reading your messages.htm
file. Are you sure the file is complete and/or not corrupted?
I think it's complete. I was able to use a different parser on it, but wanted to use yours as it has more options for output.
I think I fixed the issue. Let me know if the newest version works for you (version 0.5).
Hmm..I think I got the same error. And it happens even if I only ask it to parse a conversation with one person.
File "//anaconda/bin/fbcap", line 11, in
Do you happen to have the line from your messages.htm
file that it's crashing on? I think it may be an encoding error on Facebook's end.
I think so..But what does it mean that it's column 12969? I don't see how there's that many columns..
Well, the document coming from Facebook is one giant unformatted line of data, so perhaps that's not helpful.
I just pushed a new version (0.5.post4) to PyPI. If the issue is being caused by your document not being interpreted as UTF-8 for some reason, then that should help.
Okay..I'll try that. Thanks so much btw. I'll let you know if it works.
:( nope that didn't do it. I'll keep trying though.
Hmm, I'm really not sure what the issue could be. Does the program run at all, or does it crash immediately?
It seems to run fine. When I ask it to look at a specific conversation, it skips all the other one and focuses on that one. I tried maybe having it export to csv instead of stdout, but that didn't work. I'd go in and look at the code myself but I can't even find out the environment that fbcap works from..its a bit too advanced for me.
At this point, I think it may be because the XML parser being used is strict and technically HTML does not necessarily qualify a "strictly valid XML".
Unfortunately, the "less strict" drop-in replacement library lxml
requires a lot of external dependencies that can be a pain to get installed. I'm going to try and implement something less efficient using BeautifulSoup
as a fallback for situations like this. I'll respond to this ticket when it is ready.
Okay, the tool now falls back to BeautifulSoup if something goes wrong while parsing using the iterative streaming parser. Please let me know if it works for you now (version 0.6post1).
Just following up on this, Error is as follows:
The streaming parser crashed due to malformed XML. Falling back to the less strict/efficient BeautifulSoup parser. This may take a while... Traceback (most recent call last): File "/Users/user/.virtualenvs/p3/lib/python3.5/site-packages/fbchat_archive_parser/parser.py", line 122, in parse_content parser=XMLParser(encoding=str('UTF-8'))): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1290, in __next for event in self._parser.read_events(): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1273, in read_events raise event File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1231, in feed self._parser.feed(data) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3613, column 71274
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/user/.virtualenvs/p3/bin/fbcap", line 11, in
relevant line as viewed in Sublime Text:
Nevermind, works with lxml installed.
@friendswithsalad thanks, looks like the lxml
parser required for the fallback parser is a missing external dependency on your machine. It's a pain to install sometimes (especially on Windows), so I've switched it with html.parser
, which is built into python. Let me know if it works for you now (0.6.post3
)
Using the newest version. I'm new to python so I'm not sure what this means.
Thank you!
File "//anaconda/bin/fbcap", line 11, in
sys.exit(main())
File "//anaconda/lib/python3.5/site-packages/fbchat_archive_parser/main.py", line 66, in main
app.run()
File "//anaconda/lib/python3.5/site-packages/clip.py", line 652, in run
self.invoke(self.parse(tokens))
File "//anaconda/lib/python3.5/site-packages/clip.py", line 634, in invoke
self._main.invoke(parsed)
File "//anaconda/lib/python3.5/site-packages/clip.py", line 519, in invoke
self._callback({k: v for k, v in iteritems(parsed) if k not in self._subcommands})
File "//anaconda/lib/python3.5/site-packages/fbchat_archive_parser/main.py", line 27, in fbcap
progress_output=sys.stdout.isatty())
File "//anaconda/lib/python3.5/site-packages/fbchat_archive_parser/parser.py", line 98, in init
self.parse_content()
File "//anaconda/lib/python3.5/site-packages/fbchat_archive_parser/parser.py", line 107, in __parse_content
for pos, element in ET.iterparse(self.stream, events=("start", "end")):
File "//anaconda/lib/python3.5/xml/etree/ElementTree.py", line 1289, in __next**
for event in self._parser.read_events():
File "//anaconda/lib/python3.5/xml/etree/ElementTree.py", line 1272, in read_events
raise event
File "//anaconda/lib/python3.5/xml/etree/ElementTree.py", line 1230, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 5863, column 12969