phipse / complexlab_ai

1 stars 1 forks source link

bson.errors.InvalidBSON: objsize too large #15

Open phipse opened 11 years ago

phipse commented 11 years ago

I get this error on the very first try to insert an entry. If the 'entry' var in the summarist just contains the name, everything works. As there is no way to shorten the attributes, I would propose to cut down the name. But even without a 'name' the entry is too large.

-- OUTPUT

2013-04-03 23:07:45,539 DEBUG  insert_feature() inserting {'t0': datetime.date(2004, 2, 25), 'value': -0.25, 't1': datetime.date(2004, 2, 27)}
2013-04-03 23:07:45,539 DEBUG  insert_feature() 2004-02-25
2013-04-03 23:07:45,539 DEBUG  insert_feature() -0.25
2013-04-03 23:07:45,540 DEBUG  insert_feature() 2004-02-27
2013-04-03 23:07:45,540 DEBUG  insert_feature() {'attributes': {'t0': '2004-02-25', 'value': '-0.25', 't1': '2004-02-27'}, 'name': "mask_<class absolute_monotony.AbsoluteDecreasing'>"}
Traceback (most recent call last):
  File "./main.py", line 189, in <module>
    sys.exit(__cli_main())
  File "./main.py", line 180, in __cli_main
    extract(args, task, crawler_list)
  File "./main.py", line 76, in extract
    extractor_stream(task, feature_extractor, crawler_list)
  File "./main.py", line 49, in extractor_stream
    summ.process(extractResult.itervalues().next())
  File "/home/phipse/Uni/Hauptstudium/complexlab_ai/src/summarist/__init__.py", line 32, in process
    self.insert_feature(feature)
  File "/home/phipse/Uni/Hauptstudium/complexlab_ai/src/summarist/__init__.py", line 28, in insert_feature
    self.db.features.insert(entry)
  File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 359, in insert
    continue_on_error, self.__uuid_subtype), safe)
  File "/usr/lib64/python2.7/site-packages/pymongo/message.py", line 80, in insert
    encoded = [bson.BSON.encode(doc, check_keys, uuid_subtype) for doc in docs]
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 567, in encode
    return cls(_dict_to_bson(document, check_keys, uuid_subtype))
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 476, in _dict_to_bson
    elements.append(_element_to_bson(key, value, check_keys, uuid_subtype))
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 406, in _element_to_bson
    return BSONOBJ + name + _dict_to_bson(value, check_keys, uuid_subtype, False)
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 476, in _dict_to_bson
    elements.append(_element_to_bson(key, value, check_keys, uuid_subtype))
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 398, in _element_to_bson
    cstring = _make_c_string(value)
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 128, in _make_c_string
    string.decode("utf-8")
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 593, in decode
    (document, _) = _bson_to_dict(self, as_class, tz_aware, uuid_subtype)
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 334, in _bson_to_dict
    raise InvalidBSON("objsize too large")
bson.errors.InvalidBSON: objsize too large
phipse commented 11 years ago

The length is not the problem. The bson code can't handle the datetime.date construct. Even if entry just contains one mapping from attributes to datetime.date it fails. I tested other cases (very long string, mapping to dicts) too, they all work. But as soon as datetime.date is inserted, it fails.

2013-04-04 13:35:34,002 DEBUG insert_feature() {'attributes': '2004-02-27'}

phipse commented 11 years ago

Mongodb FAQ: http://api.mongodb.org/python/1.7/faq.html#id13 Only datetime.datetime is possible.

Another problem is the type of the "value" field. it get's convertet to bson.BSON, which is not accepted by the database, too.

So right now it is working for me, please test it and close this if no errors occur.