Closed jeff1evesque closed 6 years ago
Before attempting to write python logic to insert, we tried to manually enter commands on port 27019
:
>>> client = MongoClient('xxx-xxx-xxx-xxx:27019')
>>> db = client.test_database
>>> posts = db.posts
>>> post_id = posts.insert_one({'first': 'jeff', 'last': 'levesque'}).inserted_id
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 693, in insert_one
session=session),
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 607, in _insert
bypass_doc_val, session)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 595, in _insert_one
acknowledged, _insert_command, session)
File "/usr/local/lib/python3.5/dist-packages/pymongo/mongo_client.py", line 1248, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "/usr/local/lib/python3.5/dist-packages/pymongo/mongo_client.py", line 1201, in _retry_with_sess ion
return func(session, sock_info, retryable)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 592, in _insert_command
_check_write_command_response(result)
File "/usr/local/lib/python3.5/dist-packages/pymongo/helpers.py", line 217, in _check_write_command_r esponse
_raise_last_write_error(write_errors)
File "/usr/local/lib/python3.5/dist-packages/pymongo/helpers.py", line 199, in _raise_last_write_erro r
raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: can't create user databases on a --configsvr instance
Since this did not work, we inspect the running ports on the mongos
instance, using netstat -nltup
. We notice that port 27017
is also being used by mongo. Therefore, we attempt to insert on this port:
>>> from pymongo import MongoClient
>>> client = MongoClient('xxx-xxx-xxx-xxx:27017')
>>> db = client.test_database
>>> posts = db.posts
>>> post_id = posts.insert_one({'first': 'jeff', 'last': 'levesque'}).inserted_id
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 693, in insert_one
session=session),
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 607, in _insert
bypass_doc_val, session)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 595, in _insert_one
acknowledged, _insert_command, session)
File "/usr/local/lib/python3.5/dist-packages/pymongo/mongo_client.py", line 1248, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "/usr/local/lib/python3.5/dist-packages/pymongo/mongo_client.py", line 1201, in _retry_with_sess ion
return func(session, sock_info, retryable)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 590, in _insert_command
retryable_write=retryable_write)
File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 579, in command
unacknowledged=unacknowledged)
File "/usr/local/lib/python3.5/dist-packages/pymongo/network.py", line 142, in command
unpacked_docs = reply.unpack_response(codec_options=codec_options)
File "/usr/local/lib/python3.5/dist-packages/pymongo/message.py", line 1418, in unpack_response
self.raw_response(cursor_id)
File "/usr/local/lib/python3.5/dist-packages/pymongo/message.py", line 1398, in raw_response
error_object)
pymongo.errors.OperationFailure: database error: error creating initial database config information :: caused by :: socket exception [CONNECT_ERROR] for rs1/ip-172-31-34-158.ec2.internal:27018,ip-172-31-38- 98.ec2.internal:27018,ip-172-31-40-241.ec2.internal:27018
At first glance, the insert statements seem to be working from a local instance:
root@ubuntu-xenial:/home/vagrant# python3 insert.py
post_id: 5bda655f076129444aa25e1f
root@ubuntu-xenial:/home/vagrant# cat insert.py
from pymongo import MongoClient
client = MongoClient('xxx.xxx.xxx.xxx:27017')
db = client.test_database
mycol = db.col_1
post_id = mycol.insert_one({'first': 'jeff', 'last': 'levesque'}).inserted_id
print('post_id: {}'.format(post_id))
root@ubuntu-xenial:/home/vagrant#
root@ubuntu-xenial:/home/vagrant#
root@ubuntu-xenial:/home/vagrant#
root@ubuntu-xenial:/home/vagrant#
root@ubuntu-xenial:/home/vagrant# python3 select.py
collection names: []
root@ubuntu-xenial:/home/vagrant# cat select.py
from pymongo import MongoClient
client = MongoClient('xxx.xxx.xxx.xxx:27017')
print('collection names: {}'.format(client.dh.collection_names()))
However, on the mongos machine, the logs indicate some kind of distributed lock/unlocking:
2018-11-01T02:27:04.413+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda6478065e8d3e63923c2d
2018-11-01T02:27:04.415+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:09.842+0000 [LockPinger] cluster xxx.xxx.xxx.xxx:27019 pinged successfully at Thu Nov 1 02:27:09 2018 by distributed lock pinger 'xxx.xxx.xxx.xxx:27019/xxx.xxx.xxx.xxx:27017:1541038299:1804289383', sleeping for 30000ms
2018-11-01T02:27:10.417+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda647e065e8d3e63923c2e
2018-11-01T02:27:10.418+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:16.420+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda6484065e8d3e63923c2f
2018-11-01T02:27:16.422+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:22.424+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda648a065e8d3e63923c30
2018-11-01T02:27:22.425+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:28.428+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda6490065e8d3e63923c31
2018-11-01T02:27:28.429+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:34.431+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda6496065e8d3e63923c32
2018-11-01T02:27:34.433+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:39.843+0000 [LockPinger] cluster xxx.xxx.xxx.xxx:27019 pinged successfully at Thu Nov 1 02:27:39 2018 by distributed lock pinger 'xxx.xxx.xxx.xxx:27019/xxx.xxx.xxx.xxx:27017:1541038299:1804289383', sleeping for 30000ms
2018-11-01T02:27:40.435+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda649c065e8d3e63923c33
2018-11-01T02:27:40.436+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:46.439+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda64a2065e8d3e63923c34
2018-11-01T02:27:46.440+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:52.442+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda64a8065e8d3e63923c35
2018-11-01T02:27:52.444+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:27:58.446+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda64ae065e8d3e63923c36
2018-11-01T02:27:58.447+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
2018-11-01T02:28:04.450+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' acquired, ts : 5bda64b4065e8d3e63923c37
2018-11-01T02:28:04.451+0000 [Balancer] distributed lock 'balancer/xxx.xxx.xxx.xxx:27017:1541038299:1804289383' unlocked.
Our ealier test with the custom select.py
was incorrectly defined. With some adjustments, we have verified that our replicated mongodb is capable of writing, then reading the corresponding data:
root@ubuntu-xenial:/home/vagrant# python3 select.py
collection names: {'last': 'levesque', '_id': ObjectId('5bda64220761294433e3370a'), 'first': 'jeff'}
collection names: {'last': 'levesque', '_id': ObjectId('5bda653c076129444056cf69'), 'first': 'jeff'}
collection names: {'last': 'levesque', '_id': ObjectId('5bda655f076129444aa25e1f'), 'first': 'jeff'}
root@ubuntu-xenial:/home/vagrant# cat select.py
from pymongo import MongoClient
client = MongoClient('xxx.xxx.xxx.xxx:27017')
db = client.test_database
mycol = db.col_1.find()
for col in mycol:
print('collection names: {}'.format(col))
I'll probably test the above scripts tomorrow, and see if the reddit data is stored into the mongo shard.
6c422ad: after execution of upload.py
:
root@ubuntu-xenial:/home/vagrant/ist-664# python3 upload.py
Traceback (most recent call last):
File "upload.py", line 49, in <module>
post_id = col.insert_many(data).inserted_id
AttributeError: 'InsertManyResult' object has no attribute 'inserted_id'
d473d98: the following indicates many documents were inserted:
root@ubuntu-xenial:/home/vagrant/ist-664# python3 check.py
document count: 147288
root@ubuntu-xenial:/home/vagrant/ist-664# cat check.py
from pymongo import MongoClient
from config import (
mongos_endpoint,
mongos_port,
database,
collection
)
client = MongoClient('{}:{}'.format(
mongos_endpoint,
mongos_port
))
# database + collection
db = client[database]
col = db[collection]
print('document count: {}'.format(col.count_documents({})))
Each document could be queried using the findall({})
, and produce a long traceback. For this reason, the corresponding output is not shown above.
We need to add logic to allow our dataset(s) to be stored into mongodb.