ptwobrussell / Mining-the-Social-Web

The official online compendium for Mining the Social Web (O'Reilly, 2011)
http://bit.ly/135dHfs
Other
1.21k stars 490 forks source link

JSON Problem with mailboxes__jsonify_mbox.py #50

Closed weichsl closed 11 years ago

weichsl commented 11 years ago

When executing the provided script, I encounter following error:

Traceback (most recent call last): File "C:\Users\Christian\workspace\Mail Datamining\mailboxes__jsonify_mbox.py", line 89, in json.dump(json_msgs,open(OUT_FILE, 'wb'), indent=4) File "C:\Python27\lib\jsoninit.py", line 181, in dump for chunk in iterable: File "C:\Python27\lib\json\encoder.py", line 436, in _iterencode o = _default(o) File "C:\Python27\lib\json\encoder.py", line 178, in default raise TypeError(repr(o) + " is not JSON serializable") TypeError: <generator object gen_json_msgs at 0x024BFB98> is not JSON serializable

I'm new to Python, but it seems that the code is trying to serialize the function itself instead of the objects it returns.

Thank you very much, for any help!!

ptwobrussell commented 11 years ago

In helping you to troubleshoot this, I have a few question:

Before we get too much further, I just want to make sure that these two questions have reasonable answers. Thanks!

weichsl commented 11 years ago

Thank you very much :)

Here is what I pass as arguments on the command line: "C:\Users\Christian\Desktop\enron.mbox" "C:\Users\Christian\Desktop\enron.json"

And I also verified that the enron.mbox file is really an mbox file. I downloaded it from where you referenced to:

http://zaffra.com/static/matthew/enron.mbox.gz

Usually I use eclipse for experimenting with python. But here is what I got on my commandline:

C:\Users\Christian\workspace\Mail Datamining>python mailboxesjsonify_mbox.py " C:\Users\Christian\Desktop\enron.mbox" "C:\Users\Christian\Desktop\enron.json" Traceback (most recent call last): File "mailboxes__jsonify_mbox.py", line 89, in json.dump(json_msgs,open(OUT_FILE, 'wb'), indent=4) File "C:\Python27\lib\jsoninit__.py", line 181, in dump for chunk in iterable: File "C:\Python27\lib\json\encoder.py", line 436, in _iterencode o = _default(o) File "C:\Python27\lib\json\encoder.py", line 178, in default raise TypeError(repr(o) + " is not JSON serializable") TypeError: <generator object gen_json_msgs at 0x0269ADA0> is not JSON serializab le

ptwobrussell commented 11 years ago

Can you confirm that you have the latest version of the script from https://github.com/ptwobrussell/Mining-the-Social-Web/blob/master/python_code/mailboxes__jsonify_mbox.py ? It looks like you are using an earlier version. Out of curiosity, where did you get this version? Did you copy it out of the book line by line, or was this from a previous download of the GitHub archive and perhaps you just didn't update it in a while?

weichsl commented 11 years ago

I'm using the latest version of the script. I'm in sync with your github repository.

ptwobrussell commented 11 years ago

Hmm. Your stack trace doesn't match the link to the latest file I posted though. See what I mean?

weichsl commented 11 years ago

I see. I might have used another version. Now when using the most recent script following error occurs:

mbox = mailbox.UnixMailbox(open(MBOX, 'rb'), email.message_from_file)  
                                                                      ^

IndentationError: unindent does not match any outer indentation level

When correcting the indentation level:

Traceback (most recent call last): File "C:\Users\Christian\Documents\GitHub\Mining-the-Social-Web\python_code\mailboxesjsonify_mbox.py", line 73, in json.dump(gen_json_msgs(mbox),open(OUT_FILE, 'wb'), indent=4) File "C:\Python27\lib\jsoninit__.py", line 181, in dump for chunk in iterable: File "C:\Python27\lib\json\encoder.py", line 436, in _iterencode o = _default(o) File "C:\Python27\lib\json\encoder.py", line 178, in default raise TypeError(repr(o) + " is not JSON serializable") TypeError: <generator object gen_json_msgs at 0x02108B98> is not JSON serializable

ptwobrussell commented 11 years ago

I'm sorry about that indentation error. I think it must have been introduced through a pull request that I accepted a while back, and I haven't run the code myself since then, so it went unnoticed.

Back to your issue - I just figured out what is going on. I was originally developing with Python 2.6 and was trying to use json2 as an import to speed up serialization into JSON, which worked fine. Then I got a pull request to use a generator, which was also a great idea...except that when you use the default json package that comes with Python 2.7, it no longer is able to actually serialize what the generator is producing....hence, the need to patch this.

Thank you for this feedback. It was very helpful, and I'm glad we got this sorted out. I hope it didn't cause you too much trouble.

weichsl commented 11 years ago

Thx für resolving this problem so quick. I really appreciate this!