Open sidra-asa opened 8 years ago
@sidra-asa
Were you still seeing errors after related to upsert?
Traceback (most recent call last):
File "/opt/mnemosyne/env/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/opt/mnemosyne/normalizer/normalizer.py", line 125, in inserter
self.database.insert_normalized(norm, id, identifier)
File "/opt/mnemosyne/persistance/mnemodb.py", line 97, in insert_normalized
upsert=True)
File "/opt/mnemosyne/env/local/lib/python2.7/site-packages/pymongo/collection.py", line 552, in update
_check_write_command_response(results)
File "/opt/mnemosyne/env/local/lib/python2.7/site-packages/pymongo/helpers.py", line 205, in _check_write_command_response
raise OperationFailure(error.get("errmsg"), error.get("code"), error)
OperationFailure: insertDocument :: caused by :: 17280 Btree::insert: key too large to index, failing mnemosyne.dork.$content_1 1127 { : "/suse/include/components/com_artlinks/support/mailling/maillist/inc/include/control/999999.9+%0BuNiOn%0BaLl+%0BsElEcT+0x393133353134353632312e39,0x393..." }
<Greenlet at 0x7f9f7d9db7d0: <bound method Normalizer.inserter of <normalizer.normalizer.Normalizer object at 0x7f9f7d9c5f90>>([([{'session': {'_id': ObjectId('57aee159e5645d38e)> failed with OperationFailure
I'm attempting the hashed index as well, though not recreating entire collcetion; failing still though I believe it is because of the upsert on update method.
mnemosyne/persistance/mnemodb.py
line 97~
elif collection is 'dork':
self.db[collection].update({'content': document['content'], 'type': document['type']},
{'$set': {'lasttime': document['timestamp']},
'$inc': {'count': document['count']}},
upsert=True)
@sh4t
I dropped the index of dork content, and created hashed one. I just checked the log , but there's no such error like yours. Could you give it a try to see if error occurs ?
If any suggestion, please let me know.
I'm found some errors in mnemosyne.err as below.
It could be the content is too long to be indexed. I've using hashed content as index key instead of text :
https://github.com/johnnykv/mnemosyne/blob/master/persistance/mnemodb.py#L48
Now it seems work fine. If any suggestion, please let me know.