ptwobrussell / Mining-the-Social-Web

The official online compendium for Mining the Social Web (O'Reilly, 2011)
1.21k stars 491 forks source link

enron json file #17

Closed tomas-aftalion closed 12 years ago

tomas-aftalion commented 12 years ago

I can't find the json data for chapter 3. Is it supposed to be given?? Or should we create it from scratch (from cmu cs webpage)?

ptwobrussell commented 12 years ago

I think we sorted this out on Twitter, but I'll go ahead and add a link to the JSONified mbox data that I think most people will want:

tomas-aftalion commented 12 years ago

Thank you very much for the support!I will be trying to give it another go these couple of days... Best,Tomas

Date: Tue, 17 Apr 2012 20:45:30 -0700 From: To: Subject: Re: [Mining-the-Social-Web] enron json file (#17)

I think we sorted this out on Twitter, but I'll go ahead and add a link to the JSONified mbox data that I think most people will want:

Reply to this email directly or view it on GitHub:

tomas-aftalion commented 12 years ago

used wget ''

newbieDatascientist commented 12 years ago

Thanks for your support, Matthew. I am not able to load json data into couchdb. I tried with your JSONified mbox data at I got this error: $ python enron.mbox.json Traceback (most recent call last): File "", line 16, in docs = json.loads(open(JSON_MBOX).read()) File "/usr/lib/python2.7/json/", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

ptwobrussell commented 12 years ago

Did you gunzip the file before trying to load it? It looks as though you may be trying to load the compressed file based on the error message you are reporting

newbieDatascientist commented 12 years ago

Thanks for your response. I had gunzipped it as below: $ gunzip -dv enron.mbox.json.gz enron.mbox.json.gz: 0.0% -- replaced with enron.mbox.json I tried again, but still got the same error.

revelutions commented 11 years ago

Has anyone managed to unzip this. I tried on windows but the unzipped file has non-ascii chars.

ptwobrussell commented 11 years ago

Not sure why you are having issues with this, but I wonder if it has something to do with differences in platform. See my output below from just a few moments ago:

$ wget --09:14:21-- => `enron.mbox.json.gz' Resolving Connecting to||:80... connected. HTTP request sent, awaiting response... 200 OK Length: 41,392,331 (39M) [application/json]

100%[==============================================================================================================================>] 41,392,331 1.43M/s ETA 00:00

09:14:52 (1.31 MB/s) - `enron.mbox.json.gz' saved [41392331/41392331]

$ gunzip enron.mbox.json.gz $ file enron.mbox.json enron.mbox.json: ASCII English text, with very long lines $ python Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:52:43) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.

data = open('enron.mbox.json').read() json_data = json.loads(data) import json json_data = json.loads(data) json_data [{u'X-cc': u'', u'From': u'', u'Subject': u'RE: West Position', u'To': [u''], u'Content-Transfer-Encoding': u'7bit', u'X-bcc': u'', u'parts': [{u'content': u' \nPlease let me know if you still need Curve Shift.\n\nThanks,\nHeather\n -----Original Message-----\nFrom: \tAllen, Phillip K. \nSent:\tFriday, December 07, 2001 5:14 AM\nTo:\tDunto