ptwobrussell / Mining-the-Social-Web

The official online compendium for Mining the Social Web (O'Reilly, 2011)
http://bit.ly/135dHfs
Other
1.21k stars 491 forks source link

memory error on example 3-5 #12

Open tandonami opened 12 years ago

tandonami commented 12 years ago

Hi,

I would really appreciate your help on the following issues:

In example 3-6: After adding in Couch DB configuration path for couchpy( asbolute path "C:\Python27\Scripts\couchpy.exe" ) and restarting the service I executed the following code :

import sys import couchdb from couchdb.design import ViewDefinition try: ... import jsonlib2 as json ... except ImportError: ... import json ... DB = 'enronami' START_DATE = '1900-01-01' #YYYY-MM-DD END_DATE = '2100-01-01' #YYYY-MM-DD def dateTimeToDocMapper(doc): ... from dateutil.parser import parse ... from datetime import datetime as dt ... if doc.get('Date'): ... _date = list(dt.timetuple(parse(doc['Date']))[:-3]) ... yield (_date, doc) ... view = ViewDefinition('index', 'by_date_time', dateTimeToDocMapper, ... language='python') Traceback (most recent call last): File "", line 2, in File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\design.py", line 93, in init map_fun = _strip_decorators(getsource(map_fun).rstrip()) File "C:\Python27\lib\inspect.py", line 699, in getsource lines, lnum = getsourcelines(object) File "C:\Python27\lib\inspect.py", line 688, in getsourcelines lines, lnum = findsource(object) File "C:\Python27\lib\inspect.py", line 529, in findsource raise IOError('source code not available') IOError: source code not available

Also in example 3-5 I got the following error:

db.update(docs, all_ornothing=True) Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\client.py", line 733, in update , _, data = self.resource.post_json('_bulk_docs', body=content) File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li ne 399, in post_json status, headers, data = self.post(_a, *_k) File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li ne 381, in post **params) File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li ne 419, in _request credentials=self.credentials) File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li ne 176, in request body = json.encode(body).encode('utf-8') MemoryError

But this error atleast temporarily I was able to solve by trimming enron.mbox.json to 2000 objects instead of the full size which had 41000 json objects.

With Regards, Amitabh

discopatrick commented 11 years ago

I'm having this problem too. I'm just increasing my memory on the VM and hoping this will solve it. Unfortunately I'm now up to 3 GB of RAM and it still hasn't fixed it.

Edited to add: 4 GB doesn't work either. Maybe it needs more virtual memory. I'm watching the System Monitor and it has about 50% RAM free when it crashes.

ptwobrussell commented 11 years ago

Discopatrick - Are you also using Windows in you VM? Wondering if that is the common thread

discopatrick commented 11 years ago

Actually the VM is Ubuntu 12.04. Running on VirtualBox 4.2.6.

ptwobrussell commented 11 years ago

Sorry this is taking a while for me to help you with, but I am hoping that we can pin down the issue soon. Can you give me the other pertinent details of your situation so I can better reproduce this? Version of CouchDB and Python are two that come to mind. Version of the couchdb package is another one.

discopatrick commented 11 years ago

Thanks for your help Russell. I've skipped ahead to other parts of the book, but if you'd still like the details, here they are:

On starting python in the terminal I see:

Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2

CouchDB: Apache CouchDB 1.0.1

Looking in the Ubuntu Software Center at python-couchdb I see: python-couchdb 0.8-0ubuntu2

Any more info you need, just let me know.

Almost all of my software was installed using the Ubuntu Software Center, I think there were one or two exceptions, one of which was Redis IIRC.

Pragueham commented 11 years ago

I'm getting the same error on mountain lion: python mbox-dateload.py new-enron 1900-01-01 2012-01-01 Finding docs dated from 1900-1-1 to 2012-1-1 Traceback (most recent call last): File "mbox-dateload.py", line 34, in for row in db.view('index/by_date_time', startkey=start, endkey=end): File "/Library/Python/2.7/site-packages/couchdb/client.py", line 984, in iter return iter(self.rows) File "/Library/Python/2.7/site-packages/couchdb/client.py", line 1003, in rows self._fetch() File "/Library/Python/2.7/site-packages/couchdb/client.py", line 990, in _fetch data = self.view._exec(self.options) File "/Library/Python/2.7/site-packages/couchdb/client.py", line 880, in exec , _, data = self.resource.get_json(_self._encode_options(options)) File "/Library/Python/2.7/site-packages/couchdb/http.py", line 393, in get_json status, headers, data = self.get(_a, _k) File "/Library/Python/2.7/site-packages/couchdb/http.py", line 374, in get return self._request('GET', path, headers=headers, *_params) File "/Library/Python/2.7/site-packages/couchdb/http.py", line 419, in _request credentials=self.credentials) File "/Library/Python/2.7/site-packages/couchdb/http.py", line 310, in request raise ServerError((status, error)) couchdb.http.ServerError: (500, (u'EXIT', u'{{badmatch,[]},\n [{couch_query_servers,new_process,3,\n [{file,"couch_query_servers.erl"},{line,472}]},\n {couch_query_servers,lang_proc,3,\n [{file,"couch_query_servers.erl"},{line,462}]},\n {couch_query_servers,handle_call,3,\n [{file,"couch_query_servers.erl"},{line,334}]},\n {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,588}]},\n {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}'))

Localised error: view = ViewDefinition('index', 'by_date_time', dateTimeToDocMapper, language='python') Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.7/site-packages/couchdb/design.py", line 93, in init map_fun = _strip_decorators(getsource(map_fun).rstrip()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 699, in getsource lines, lnum = getsourcelines(object) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 688, in getsourcelines lines, lnum = findsource(object) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/inspect.py", line 529, in findsource raise IOError('source code not available') IOError: source code not available

Python details: Python 2.7.2 (default, Jun 16 2012, 12:38:40) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin

Any ideas?

Thanks

ptwobrussell commented 11 years ago

Carving out some this this evening to try and work on this. In previous attempts, I haven't been able to reproduce it, and I think it may have been because I wasn't using the same version of CouchDB as was producing the problem under the faulty assumption that my older version would have probably exhibited the same issue. What version are you using? Also, what version of the couchdb package are you using? (What does couchdb.version return?)

Pragueham commented 11 years ago

It's 1.21

Edit - sorry 0.8.

On Monday, 11 February 2013 at 20:17, Matthew A. Russell wrote:

Carving out some this this evening to try and work on this. In previous attempts, I haven't been able to reproduce it, and I think it may have been because I wasn't using the same version of CouchDB as was producing the problem under the faulty assumption that my older version would have probably exhibited the same issue. What version are you using?

— Reply to this email directly or view it on GitHub (https://github.com/ptwobrussell/Mining-the-Social-Web/issues/12#issuecomment-13400260).

ptwobrussell commented 11 years ago

My attempts to reproduce this problem were not fruitful. We could try to further isolate the problem on your environment, but it might be just as easy to reach out to #couchdb on IRC or ask the mailing list for help since this appears that it could be a CouchDB specific issue involving a memory setting.

jrr46 commented 11 years ago

it's not as cool, but you can get around the memory issue with: for doc in docs: db.save(doc)

Pragueham commented 11 years ago

Oh great thanks for helping.

D

On Friday, 12 April 2013 at 15:31, laksdjhfads wrote:

it's not as cool, but you can get around the memory issue with: for doc in docs: db.save(doc)

— Reply to this email directly or view it on GitHub (https://github.com/ptwobrussell/Mining-the-Social-Web/issues/12#issuecomment-16296020).

ptwobrussell commented 11 years ago

@laksdjhfads and @Pragueham - Did this workaround do ok for you? If so, would either of you like to submit a pull request so I can credit you with the fix?