Closed tomas-aftalion closed 12 years ago
I think we sorted this out on Twitter, but I'll go ahead and add a link to the JSONified mbox data that I think most people will want: http://zaffra.com/static/matthew/enron.mbox.json.gz
Thank you very much for the support!I will be trying to give it another go these couple of days... Best,Tomas
Date: Tue, 17 Apr 2012 20:45:30 -0700 From: reply@reply.github.com To: tomasaftalion@hotmail.com Subject: Re: [Mining-the-Social-Web] enron json file (#17)
I think we sorted this out on Twitter, but I'll go ahead and add a link to the JSONified mbox data that I think most people will want: http://zaffra.com/static/matthew/enron.mbox.json.gz
Reply to this email directly or view it on GitHub: https://github.com/ptwobrussell/Mining-the-Social-Web/issues/17#issuecomment-5191311
Thanks for your support, Matthew. I am not able to load json data into couchdb. I tried with your JSONified mbox data at http://zaffra.com/static/matthew/enron.mbox.json.gz. I got this error:
$ python mailboxesload_json_mbox.py enron.mbox.json
Traceback (most recent call last):
File "mailboxesload_json_mbox.py", line 16, in
Did you gunzip the file before trying to load it? It looks as though you may be trying to load the compressed file based on the error message you are reporting
Thanks for your response. I had gunzipped it as below: $ gunzip -dv enron.mbox.json.gz enron.mbox.json.gz: 0.0% -- replaced with enron.mbox.json I tried again, but still got the same error.
Has anyone managed to unzip this. I tried on windows but the unzipped file has non-ascii chars.
Not sure why you are having issues with this, but I wonder if it has something to do with differences in platform. See my output below from just a few moments ago:
$ wget http://zaffra.com/static/matthew/enron.mbox.json.gz --09:14:21-- http://zaffra.com/static/matthew/enron.mbox.json.gz => `enron.mbox.json.gz' Resolving zaffra.com... 108.59.4.162 Connecting to zaffra.com|108.59.4.162|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 41,392,331 (39M) [application/json]
100%[==============================================================================================================================>] 41,392,331 1.43M/s ETA 00:00
09:14:52 (1.31 MB/s) - `enron.mbox.json.gz' saved [41392331/41392331]
$ gunzip enron.mbox.json.gz $ file enron.mbox.json enron.mbox.json: ASCII English text, with very long lines $ python Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:52:43) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
data = open('enron.mbox.json').read() json_data = json.loads(data) import json json_data = json.loads(data) json_data [{u'X-cc': u'', u'From': u'heather.dunton@enron.com', u'Subject': u'RE: West Position', u'To': [u'k..allen@enron.com'], u'Content-Transfer-Encoding': u'7bit', u'X-bcc': u'', u'parts': [{u'content': u' \nPlease let me know if you still need Curve Shift.\n\nThanks,\nHeather\n -----Original Message-----\nFrom: \tAllen, Phillip K. \nSent:\tFriday, December 07, 2001 5:14 AM\nTo:\tDunto
I can't find the json data for chapter 3. Is it supposed to be given?? Or should we create it from scratch (from cmu cs webpage)?