superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
https://superduper.io
Apache License 2.0
4.69k stars 454 forks source link

[BUG-0.2.0]: Failed to connect to the MongoDB database using the usename and password. #2041

Closed jieguangzhou closed 5 months ago

jieguangzhou commented 5 months ago

System Information

Main branch

What happened?

When we connect to a MongoDB, it may fail to connect if we use a URL with the database name due to permission issues.

If we start the database using testenv, it is normal because we have already handled the permissions.

If we change uri to '/'.join(uri.split('/')[:-1]), then we can connect normally in the services started by docker below, but cannot connect to the mongodb service created by testenv.

    if re.match('^mongodb:\/\/', uri) is not None:
        name = uri.split('/')[-1]
        conn: pymongo.MongoClient = pymongo.MongoClient(
            uri,
            serverSelectionTimeoutMS=5000,
        )
        return mapping['mongodb'](conn, name)
    if re.match('^mongodb:\/\/', uri) is not None:
        name = uri.split('/')[-1]
        conn: pymongo.MongoClient = pymongo.MongoClient(
            '/'.join(uri.split('/')[:-1]),
            serverSelectionTimeoutMS=5000,
        )
        return mapping['mongodb'](conn, name)

Steps to reproduce

Start a mongodb service

docker run --name mongodb -d -p 27017:27017 -e MONGO_INITDB_ROOT_USERNAME=superduper -e MONGO_INITDB_ROOT_PASSWORD=superduper mongo

Connect database

from superduperdb import super-duper
db = superduper("mongodb://superduper:superduper@localhost:27017/test_db")

...

Relevant log output

In [5]: db = superduper("mongodb://superduper:superduper@localhost:27017/test_db")
2024-May-07 17:08:37.59| INFO     | zhouhaha-2.local| superduperdb.base.build:68   | Data Client is ready. MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, serverselectiontimeoutms=5000)
2024-May-07 17:08:37.59| INFO     | zhouhaha-2.local| superduperdb.base.build:41   | Connecting to Metadata Client with engine:  MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, serverselectiontimeoutms=5000)
2024-May-07 17:08:37.59| INFO     | zhouhaha-2.local| superduperdb.base.build:156  | Connecting to compute client: None
2024-May-07 17:08:37.59| INFO     | zhouhaha-2.local| superduperdb.base.datalayer:86   | Building Data Layer
[2024-05-07 17:08:37] pymongo.serverSelection INFO {"message": "Waiting for suitable server to become available", "selector": "Primary()", "operation": "distinct", "topologyDescription": "<TopologyDescription id: 6639ef954b5f304a940b120a, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None>]>", "clientId": {"$oid": "6639ef954b5f304a940b120a"}, "remainingTimeMS": 4}
---------------------------------------------------------------------------
OperationFailure                          Traceback (most recent call last)
Cell In[5], line 1
----> 1 db = superduper("mongodb://superduper:superduper@localhost:27017/test_db")

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/base/superduper.py:21, in superduper(item, **kwargs)
     18     item = CFG.data_backend
     20 if isinstance(item, str):
---> 21     return _auto_identify_connection_string(item, **kwargs)
     23 return _DuckTyper.run(item, **kwargs)

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/base/superduper.py:47, in _auto_identify_connection_string(item, **kwargs)
     45         raise ValueError(f'{item} is not a valid connection string')
     46     kwargs['data_backend'] = item
---> 47 return build_datalayer(CFG, **kwargs)

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/base/build.py:187, in build_datalayer(cfg, databackend, **kwargs)
    184 artifact_store = _build_artifact_store(cfg.artifact_store, databackend)
    185 compute = _build_compute(cfg.cluster.compute.uri)
--> 187 datalayer = Datalayer(
    188     databackend=databackend,
    189     metadata=metadata,
    190     artifact_store=artifact_store,
    191     compute=compute,
    192 )
    193 # Keep the real configuration in the datalayer object.
    194 datalayer.cfg = cfg

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/base/datalayer.py:105, in Datalayer.__init__(self, databackend, metadata, artifact_store, compute)
    101 self.artifact_store.serializers = self.datatypes
    103 self.databackend = databackend
--> 105 self.cdc = DatabaseChangeDataCapture(self)
    107 self.compute = compute
    108 self._server_mode = False

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/cdc/cdc.py:340, in DatabaseChangeDataCapture.__init__(self, db)
    335 self._running: bool = False
    336 self._cdc_existing_collections: t.MutableSequence[
    337     t.Union['TableOrCollection', 'Table']
    338 ] = []
--> 340 listeners = self.db.show('listeners')
    341 if listeners:
    342     from superduperdb.components.listener import Listener

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/base/datalayer.py:242, in Datalayer.show(self, type_id, identifier, version)
    239     return [x._asdict() for x in out]
    241 if identifier is None:
--> 242     return self.metadata.show_components(type_id=type_id)
    244 if version is None:
    245     return sorted(
    246         self.metadata.show_component_versions(
    247             type_id=type_id, identifier=identifier
    248         )
    249     )

File ~/workspace/SuperDuperDB/superduperdb/superduperdb/backends/mongodb/metadata.py:130, in MongoMetaDataStore.show_components(self, type_id)
    127 def show_components(self, type_id: t.Optional[str] = None):
    128     # TODO: Should this be sorted?
    129     if type_id is not None:
--> 130         return self.component_collection.distinct(
    131             'identifier', {'type_id': type_id}
    132         )
    133     else:
    134         return list(
    135             self.component_collection.find(
    136                 {}, {'identifier': 1, '_id': 0, 'type_id': 1}
    137             )
    138         )

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/collection.py:3027, in Collection.distinct(self, key, filter, session, comment, **kwargs)
   3011 def _cmd(
   3012     session: Optional[ClientSession],
   3013     _server: Server,
   3014     conn: Connection,
   3015     read_preference: Optional[_ServerMode],
   3016 ) -> list:
   3017     return self._command(
   3018         conn,
   3019         cmd,
   (...)
   3024         user_fields={"values": 1},
   3025     )["values"]
-> 3027 return self._retryable_non_cursor_read(_cmd, session, operation=_Op.DISTINCT)

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/collection.py:1910, in Collection._retryable_non_cursor_read(self, func, session, operation)
   1908 client = self.__database.client
   1909 with client._tmp_session(session) as s:
-> 1910     return client._retryable_read(func, self._read_preference_for(s), s, operation)

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/mongo_client.py:1534, in MongoClient._retryable_read(self, func, read_pref, session, operation, address, retryable, operation_id)
   1529 # Ensure that the client supports retrying on reads and there is no session in
   1530 # transaction, otherwise, we will not support retry behavior for this call.
   1531 retryable = bool(
   1532     retryable and self.options.retry_reads and not (session and session.in_transaction)
   1533 )
-> 1534 return self._retry_internal(
   1535     func,
   1536     session,
   1537     None,
   1538     operation,
   1539     is_read=True,
   1540     address=address,
   1541     read_pref=read_pref,
   1542     retryable=retryable,
   1543     operation_id=operation_id,
   1544 )

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/_csot.py:108, in apply.<locals>.csot_wrapper(self, *args, **kwargs)
    106         with _TimeoutContext(timeout):
    107             return func(self, *args, **kwargs)
--> 108 return func(self, *args, **kwargs)

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/mongo_client.py:1501, in MongoClient._retry_internal(self, func, session, bulk, operation, is_read, address, read_pref, retryable, operation_id)
   1464 @_csot.apply
   1465 def _retry_internal(
   1466     self,
   (...)
   1475     operation_id: Optional[int] = None,
   1476 ) -> T:
   1477     """Internal retryable helper for all client transactions.
   1478
   1479     :param func: Callback function we want to retry
   (...)
   1488     :return: Output of the calling func()
   1489     """
   1490     return _ClientConnectionRetryable(
   1491         mongo_client=self,
   1492         func=func,
   1493         bulk=bulk,
   1494         operation=operation,
   1495         is_read=is_read,
   1496         session=session,
   1497         read_pref=read_pref,
   1498         address=address,
   1499         retryable=retryable,
   1500         operation_id=operation_id,
-> 1501     ).run()

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/mongo_client.py:2347, in _ClientConnectionRetryable.run(self)
   2345 self._check_last_error(check_csot=True)
   2346 try:
-> 2347     return self._read() if self._is_read else self._write()
   2348 except ServerSelectionTimeoutError:
   2349     # The application may think the write was never attempted
   2350     # if we raise ServerSelectionTimeoutError on the retry
   2351     # attempt. Raise the original exception instead.
   2352     self._check_last_error()

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/mongo_client.py:2479, in _ClientConnectionRetryable._read(self)
   2477 self._server = self._get_server()
   2478 assert self._read_pref is not None, "Read Preference required on read calls"
-> 2479 with self._client._conn_from_server(self._read_pref, self._server, self._session) as (
   2480     conn,
   2481     read_pref,
   2482 ):
   2483     if self._retrying and not self._retryable:
   2484         self._check_last_error()

File /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/mongo_client.py:1350, in MongoClient._conn_from_server(self, read_preference, server, session)
   1347 topology = self._get_topology()
   1348 single = topology.description.topology_type == TOPOLOGY_TYPE.Single
-> 1350 with self._checkout(server, session) as conn:
   1351     if single:
   1352         if conn.is_repl and not (session and session.in_transaction):
   1353             # Use primary preferred to ensure any repl set member
   1354             # can handle the request.

File /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/mongo_client.py:1260, in MongoClient._checkout(self, server, session)
   1258     yield session._pinned_connection
   1259     return
-> 1260 with server.checkout(handler=err_handler) as conn:
   1261     # Pin this session to the selected server or connection.
   1262     if (
   1263         in_txn
   1264         and session
   (...)
   1269         )
   1270     ):
   1271         session._pin(server, conn)

File /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/pool.py:1758, in Pool.checkout(self, handler)
   1749     if _CONNECTION_LOGGER.isEnabledFor(logging.DEBUG):
   1750         _debug_log(
   1751             _CONNECTION_LOGGER,
   1752             clientId=self._client_id,
   (...)
   1755             serverPort=self.address[1],
   1756         )
-> 1758 conn = self._get_conn(checkout_started_time, handler=handler)
   1760 if self.enabled_for_cmap:
   1761     assert listeners is not None

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/pool.py:1916, in Pool._get_conn(self, checkout_started_time, handler)
   1914 else:  # We need to create a new connection
   1915     try:
-> 1916         conn = self.connect(handler=handler)
   1917     finally:
   1918         with self._max_connecting_cond:

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/pool.py:1720, in Pool.connect(self, handler)
   1717     if handler:
   1718         handler.contribute_socket(conn, completed_handshake=False)
-> 1720     conn.authenticate()
   1721 except BaseException:
   1722     conn.close_conn(ConnectionClosedReason.ERROR)

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/pool.py:1093, in Connection.authenticate(self, reauthenticate)
   1091 creds = self.opts._credentials
   1092 if creds:
-> 1093     auth.authenticate(creds, self, reauthenticate=reauthenticate)
   1094 self.ready = True
   1095 if self.enabled_for_cmap:

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/auth.py:657, in authenticate(credentials, conn, reauthenticate)
    655     _authenticate_oidc(credentials, conn, reauthenticate)
    656 else:
--> 657     auth_func(credentials, conn)

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/auth.py:561, in _authenticate_default(credentials, conn)
    559         return _authenticate_scram(credentials, conn, "SCRAM-SHA-256")
    560     else:
--> 561         return _authenticate_scram(credentials, conn, "SCRAM-SHA-1")
    562 else:
    563     return _authenticate_scram(credentials, conn, "SCRAM-SHA-1")

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/auth.py:300, in _authenticate_scram(credentials, conn, mechanism)
    298 else:
    299     nonce, first_bare, cmd = _authenticate_scram_start(credentials, mechanism)
--> 300     res = conn.command(source, cmd)
    302 assert res is not None
    303 server_first = res["payload"]

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/helpers.py:327, in _handle_reauth.<locals>.inner(*args, **kwargs)
    324 from pymongo.pool import Connection
    326 try:
--> 327     return func(*args, **kwargs)
    328 except OperationFailure as exc:
    329     if no_reauth:

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/pool.py:985, in Connection.command(self, dbname, spec, read_preference, codec_options, check, allowable_errors, read_concern, write_concern, parse_write_concern_error, collation, session, client, retryable_write, publish_events, user_fields, exhaust_allowed)
    983     self._raise_if_not_writable(unacknowledged)
    984 try:
--> 985     return command(
    986         self,
    987         dbname,
    988         spec,
    989         self.is_mongos,
    990         read_preference,
    991         codec_options,
    992         session,
    993         client,
    994         check,
    995         allowable_errors,
    996         self.address,
    997         listeners,
    998         self.max_bson_size,
    999         read_concern,
   1000         parse_write_concern_error=parse_write_concern_error,
   1001         collation=collation,
   1002         compression_ctx=self.compression_context,
   1003         use_op_msg=self.op_msg_enabled,
   1004         unacknowledged=unacknowledged,
   1005         user_fields=user_fields,
   1006         exhaust_allowed=exhaust_allowed,
   1007         write_concern=write_concern,
   1008     )
   1009 except (OperationFailure, NotPrimaryError):
   1010     raise

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/network.py:212, in command(conn, dbname, spec, is_mongos, read_preference, codec_options, session, client, check, allowable_errors, address, listeners, max_bson_size, read_concern, parse_write_concern_error, collation, compression_ctx, use_op_msg, unacknowledged, user_fields, exhaust_allowed, write_concern)
    210             client._process_response(response_doc, session)
    211         if check:
--> 212             helpers._check_command_response(
    213                 response_doc,
    214                 conn.max_wire_version,
    215                 allowable_errors,
    216                 parse_write_concern_error=parse_write_concern_error,
    217             )
    218 except Exception as exc:
    219     duration = datetime.datetime.now() - start

File ~/workspace/SuperDuperDB/superduperdb/env/lib/python3.11/site-packages/pymongo/helpers.py:233, in _check_command_response(response, max_wire_version, allowable_errors, parse_write_concern_error)
    230 elif code == 43:
    231     raise CursorNotFound(errmsg, code, response, max_wire_version)
--> 233 raise OperationFailure(errmsg, code, response, max_wire_version)

OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}
blythed commented 5 months ago

Suggestion: remove the test_db in connecting, and then find the database from the client after sucessfully connecting.

Datalayer(
    MongoDatabackend(conn=getattr(MongoClient('/'.join(<uri>.split('/')[:-1]), <uri>.split('/')[-1])
)