Closed dingding72 closed 4 years ago
I think the problem your running into is that you're passing a single RawBSONDocument to insert_many
instead of a list of documents. We give a helpful error when the documents
argument is a single dict, SON, or OrderedDict:
>>> client.test.test.insert_many({})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pymongo/collection.py", line 739, in insert_many
raise TypeError("documents must be a non-empty list")
TypeError: documents must be a non-empty list
>>> client.test.test.insert_many(SON())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pymongo/collection.py", line 739, in insert_many
raise TypeError("documents must be a non-empty list")
TypeError: documents must be a non-empty list
>>> client.test.test.insert_many(OrderedDict())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pymongo/collection.py", line 739, in insert_many
raise TypeError("documents must be a non-empty list")
TypeError: documents must be a non-empty list
However when passing a single RawBSONDocument to insert_many
we get this unhelpful error:
>>> client.test.test.insert_many(RawBSONDocument(bson.BSON.encode({'_id':2})))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pymongo/collection.py", line 753, in insert_many
blk.ops = [doc for doc in gen()]
File "pymongo/collection.py", line 744, in gen
common.validate_is_document_type("document", document)
File "pymongo/common.py", line 453, in validate_is_document_type
"collections.MutableMapping" % (option,))
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
I opened https://jira.mongodb.org/browse/PYTHON-1690 to fix the exception in this case. But I don't think there's any other bug here. insert_many
works when given a list of RawBSONDocuments:
>>> import bson
>>> from bson.raw_bson import RawBSONDocument
>>> docs = [{'_id':1}, RawBSONDocument(bson.BSON.encode({'_id':2}))]
>>> docs
[{'_id': 1}, RawBSONDocument('\x0e\x00\x00\x00\x10_id\x00\x02\x00\x00\x00\x00', codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None))]
>>> client.test.test.insert_many(docs)
<pymongo.results.InsertManyResult object at 0x106c32758>
>>> list(client.test.test.find())
[{u'_id': 1}, {u'_id': 2}]
Hi, Shane, thank you very much! It works beautifully now! I am going to load all my data for the next few days, all in RawBSONDocuments and hopefully I can see a big performance improvement. Thanks!
Hi, Shane:
I have a way to convert the result mongodb cursor back to pandas dataframe. The speed is ok, about 1 second for 100K rows (documents) with 20+ columns (query 20+million documents only took < 0.2 seconds in mongodb) but I didn't use bsonjs's dumps. Just wondering what's your suggestion/best practices/fastest approach to convert the cursor to dataframe?
Thanks!
To convert the cursor to a dataframe it may be faster to use BSON-NumPy (https://bson-numpy.readthedocs.io/en/latest/) to convert the cursor to a NumPy array. Then you could use the array instead of a dataframe or convert the array into a dataframe.
Please let me know how either of these options work!
i was experimenting with bsonjs. insert_one works fine, however when I tried insert_many, I got the following error msg: "File "C:\ProgramData\Anaconda2\lib\site-packages\pymongo\common.py", line 453, in validate_is_document_type "collections.MutableMapping" % (option,)) TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping".
I casted "rawBS1 = RawBSONDocument(bson_bytes)" just the line before, and worked fine with insert_one.