vert-x3 / vertx-mongo-client

Mongo Client for Eclipse Vert.x
http://vertx.io
Apache License 2.0
58 stars 98 forks source link

Sharing a MongoClient for multiple databases #235

Open bfreuden opened 3 years ago

bfreuden commented 3 years ago

Describe the feature

I would like to access multiple databases from a single MongoClient instance.

As far as I can understand the source code, this is not the case: a single MongoClient instance is always tightly linked to a single database (see MongoClientImpl.java in vertx-mongo-client)

private static class MongoHolder implements Shareable {
    com.mongodb.reactivestreams.client.MongoClient mongo;
    MongoDatabase db;
[...]
mongo = MongoClients.create(parser.settings());
db = mongo.getDatabase(parser.database());

Use cases

This would reduce the thread and heap pressure for Vert.x applications having to deal with many databases.

I've actually realized that each com.mongodb.reactivestreams.client.MongoClient instance seems to come with 2 threads, and 1 buffer pool (keeping buffers in the JVM heap). So if a Vert.x application has to deal with 150 databases, the JVM eventually end up with 300 threads and hundreds of MB of buffer pools.

It is of course possible to mitigate the risk by introducing a kind of LRU pool of Vert.x MongoClient (and that's what I've done, so I'm not blocked so far) but this is unfortunate since the MongoDB MongoClient seems to be able to handle multiple databases without this extra complexity (put differently: the MongoClient Database association seems to be Vert.x-idiomatic).

I hope (but I have not written a pure MongoDB program to give it a try!) using a single MongoClient would have real benefits regarding the JVM's number of threads and heap load.

bboyz269 commented 3 years ago

This one really needs attention. vertx-mongo-client should support this since underling client does.
This limitation is introducing so much hassle when dealing with multi-tenancy.

vietj commented 3 years ago

I scoped the issue and mention that it can be contributed by the community to get contributors attention @bboyz269

bfreuden commented 3 years ago

@vietj @bboyz269 I am ready to discuss the topic with you if you are willing to.

I'm under the impression it is not so complex to implement but, because the io.vertx.ext.mongo.MongoClient is mixing client and databases methods, we probably need a 2 steps plan: a first move with non-breaking API changes, and a second move with breaking API changes.

vietj commented 3 years ago

@bfreuden can you share a proposal ?

bfreuden commented 3 years ago

Yes. Having a look at the methods of the io.vertx.ext.mongo.MongoClient interface, I'm realizing it could almost be called io.vertx.ext.mongo.MongoDatabase (even io.vertx.ext.mongo.MongoCollection). Let me show you the MongoClient.java file without its database-only methods:

cat MongoClient.java | sed 's,</i>,,g'   | sed 's,\<i>,,g' | sed 's,http://,,g' | sed 's,and/or,,g' | tr "\n" "\t" | sed 's,\/\*\*[^\/]*/,,g' | tr "\t" "\n" |  grep -v "String collection" | grep -v Fluent | sed  '/^ *$/d' | grep -v "^ *Handler<AsyncResult.*$" | grep -v "^import" | grep -v "^package" | grep -v GridF | grep -v runCommand | grep -v getCollections
a>
 */
@VertxGen
public interface MongoClient {
  String DEFAULT_POOL_NAME = "DEFAULT_POOL";
  String DEFAULT_DB_NAME = "DEFAULT_DB";
  static MongoClient create(Vertx vertx, JsonObject config) {
    return new MongoClientImpl(vertx, config, UUID.randomUUID().toString());
  }
  static MongoClient createShared(Vertx vertx, JsonObject config, String dataSourceName) {
    return new MongoClientImpl(vertx, config, dataSourceName);
  }
  static MongoClient createShared(Vertx vertx, JsonObject config) {
    return new MongoClientImpl(vertx, config, DEFAULT_POOL_NAME);
  }
  @GenIgnore
  static MongoClient createWithMongoSettings(Vertx vertx, JsonObject config, String dataSourceName, MongoClientSettings settings) {
    return new MongoClientImpl(vertx, config, dataSourceName, settings);
  }
  Future<Void> close();
  void close(Handler<AsyncResult<Void>> handler);
}

We can see that there are only close and static factory methods left.

So we could create a io.vertx.ext.mongo.MongoDatabase interface with all the other methods and add a getDatabase method to the io.vertx.ext.mongo.MongoClient interface:

public interface MongoClient {
...
  MongoDatabase getDatabase(String name);
...
}

In the first step of the refactoring, the MongoClient interface could either extend the MongoDatabase interface (developer-friendly solution) or duplicate them (for the sake of being able to mark them as deprecated).

In the first approach, we could create another MongoClientImpl constructor taking a another MongoClientImpl and a database name as a parameter:

  private boolean isDatabase = false;
  public MongoClientImpl(MongoClientImpl client, String database) {
    isDatabase = true
    // more or less a shallow copy of client fields
  }

The isDatabase boolean will let us throw an UnsupportedOperationException in close methods of this particular instance. The MongoClientImple.getDatabase() method will simply call this constructor.

In the second approach we create a MongoDatabaseImpl class and copy most of MongoClientImpl methods into it. Then we introduce a database field in MongoClientImpl:

public interface MongoClientImpl {
...
  private MongoDatabaseImpl database;
  MongoDatabase getDatabase(String name);
...
}

And we live with the burden of manually delegating MongoClientImpl method calls (they don't change so often) to the MongoDatabaseImpl instance (or maybe there's something nifty to do with dynamic proxies).

In a next comment I will discuss the problem of the factory methods. What's your first impression at this point? Am I completely out of my mind? :)

bfreuden commented 3 years ago

For the moment, factory methods require a database name in the config of the client.

As it must no longer be the case in the future, all database-related methods of the MongoClient should "throw" an IllegalStateException if the database name is not set in the config.

Maybe we could log a deprecation warning to discourage users from specifying a database name MongoClient factory methods?

Should we create factory methods in the new MongoDatabase interface and let it have a getClient method?

public interface MongoDatabase {
  ...
  static MongoDatabase create(Vertx vertx, JsonObject config) {
    return new MongoClientImpl(vertx, config, UUID.randomUUID().toString());
  }
  static MongoDatabase createShared(Vertx vertx, JsonObject config, String dataSourceName) {
    return new MongoClientImpl(vertx, config, dataSourceName);
  }
  static MongoDatabase createShared(Vertx vertx, JsonObject config) {
    return new MongoClientImpl(vertx, config, DEFAULT_POOL_NAME);
  }
  MongoClient getClient();
}

If that was the case the migration path for vanilla use cases would be from:

// config containing an "auth" database name
MongoClient client  = MongoClient.createShared(vertx, config)
Future<Long> count = client.count("users", JsonObject())
...
client.close()

to:

// same config object as above (with a database name) => deprecation warning
MongoDatabase db  = MongoDatabase.createShared(vertx, config); 
// all those calls don't change (only the name of the variable does: `client` => `db`)
Future<Long> count = db.count("users", JsonObject())
...
db.getClient().close() // close requires access to the client

or:

// config without the database name => no deprecation warning
MongoDatabase db  = MongoClient.createShared(vertx, config).getDatabase("auth"); 
Future<Long> count = db.count("users", JsonObject())
...
db.getClient().close() // close requires access to the client (it's the only method left on the client)

Users will still be able to use the original code but it will eventually throw an error (vert.x 5?) when a database name is provided in the config.

In the future (the second move I was referring to) we will remove (vert.x 5?) all database-related methods from the MongoClient interface.

bfreuden commented 3 years ago

The Vert.x MongoClient is also a bit special compared to the Java driver in the sense that it also mixes MongoDatabase and MongoCollection methods (in addition to mixing MongoClient methods).

That's why the first parameter of many methods is String collection.

I'm not sure we want to push the refactoring that far though, because the migration path would be harder:

MongoDatabase db  = MongoClient.createShared(vertx, config).getDatabase("auth"); 
// this would require to rewrite all method calls to add getCollection and remove the first parameter
Future<Long> count = db.getCollection("users").count(JsonObject())
...
db.getClient().close() 
vietj commented 3 years ago

Do you think you could provide a new implementation from scratch of this instead ?

On Thu, Jul 8, 2021 at 6:19 PM bfreuden @.***> wrote:

The Vert.x MongoClient is also a bit special compared to the Java driver in the sense that it also mixes MongoDatabase and MongoCollection methods (in addition to mixing MongoClient methods).

That's why the first parameter of many methods is String collection.

I'm not sure we want to push the refactoring that far though, because the migration path would be harder:

MongoDatabase db = MongoClient.createShared(vertx, config).getDatabase("auth"); // this would require to rewrite all method calls to add getCollection and remove the first parameterFuture count = db.getCollection("users").count(JsonObject())... db.getClient().close()

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vert-x3/vertx-mongo-client/issues/235#issuecomment-876572475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCRGZG3PI6T3L4SZKJ3TWXFZHANCNFSM4TJZUZ2A .

bfreuden commented 3 years ago

@vietj

from sratch

do you mean I'm to create a new vertx-mongo-client2 repo in my github account, and to reuse the current Vert.x mongo client source code? If so: very probably, but I won't be able to start before before end of July.

bfreuden commented 2 years ago

@vietj sorry I've not been able to move on the subject : it's still on my mind but I'm too busy.

tsegismont commented 2 years ago

@bfreuden thanks for the feedback, moved this to 4.3.0 milestone

bfreuden commented 2 years ago

@tsegismont @vietj: since I have some spare time during summer, I tried to move on the topic.

I wrote a javadoc Doclet to analyze the source of the MongoDB reactive driver to have a better understanding of the MongoDB API before proceeding (you can open gephi files in the root of my repo with gephi).

I realized there are tons of classes (84) related to the creation of the MongoClient. When you set those apart, you cut the number of classes by more than 2 (around 70 remaining).

Based on that "study", I've been able to programmatically generate a Vert.x API based on the reactive API. For the moment I focused on reactive classes. There are 25 options and builder classes that should be translated using the Vert.x @DataObject idiom. This is not done already. And there are still plenty of incorrect things here and there (gridfsbucket).

However I would like to have your first impression: would you say I am on the right track?

https://bfreuden.github.io/vertx-mongodb-gen/io/vertx/mongo/client/package-summary.html

Eventually I don't want to map each and every MongoDB class to a Vert.x class: I think it is OK for the API to return MongoDB classes as long as they are not part of the com.mongodb.reactivestreams package. It think it is also very safe to leave methods returning MongoDB enums.

I also made sure to preserve useful information like server information and links to the MongoDB doc itself (e.g. find).

My repo: https://github.com/bfreuden/vertx-mongodb-gen

bfreuden commented 2 years ago

I've added some code to generate *Options classes:

https://bfreuden.github.io/vertx-mongodb-gen/index.html

Some classes can't be easily automatically generated, so I've created them manually (empty classes):

https://bfreuden.github.io/vertx-mongodb-gen/io/vertx/mongo/package-summary.html

Next step will be to create *Options classes from various *Publisher classes. After that the "big picture" would be almost complete (for the moment I've ignored methods taking Publisher as a parameter like upload GridFS methods).

bfreuden commented 2 years ago

Looks like I'm on the right track (see the SimpleTest.java):

  // get first item
  mongoClient
          .listDatabaseNames()
          .first()
          .map(res -> "first: " + res)
          .onFailure(Throwable::printStackTrace)
          .onSuccess(System.out::println);
  // get all items
  mongoClient
          .listDatabaseNames()
          .all()
          .map(res -> "all: " + res)
          .onFailure(Throwable::printStackTrace)
          .onSuccess(System.out::println);
  // get some items
  mongoClient
          .listDatabaseNames()
          .some(5)
          .map(res -> "some: " + res)
          .onFailure(Throwable::printStackTrace)
          .onSuccess(System.out::println);
  // get stream of items
  mongoClient
          .listDatabaseNames()
          .stream().pipeTo(new PrintlnWriteStream<>());
  // count documents of a collections
  mongoClient
          .getDatabase("mydb")
          .getCollection("mycol")
          .countDocuments()
          .map(res -> "count: " + res)
          .onFailure(Throwable::printStackTrace)
          .onSuccess(System.out::println);

https://bfreuden.github.io/vertx-mongodb-gen/index.html

But there's still a lot to be done:

But for the moment almost everything is generated by the doclet. Not sure that was the right approach but I'm eager to test it on the sources of the latest version of the mongo reactive driver to see if it allows an easier upgrade path.

bfreuden commented 2 years ago

Today I've integrated the JsonObject codec and created factory methods (createShared + config etc...) in the interface:

MongoClient mongoClient = MongoClient.create(vertx);

It allowed to check that retrieving data is ok:

// get first document of a collection
mongoClient
        .getDatabase("mydb")
        .getCollection("mycol")
        .find()
        .first()
        .map(res -> "find: " + res)
        .onFailure(Throwable::printStackTrace)
        .onSuccess(System.out::println);

I also tested transactions but I get an error because my MongoDB does support them (plus I need help with the future/promise stuff because I'm a Kotlin coroutine guy):

// test transactions
MongoCollection<JsonObject> collection = mongoClient.getDatabase("cinebio")
        .getCollection("documents2");

mongoClient.startSession()
        .map(session -> new SessionContext<>(mongoClient, session))
        .map(SessionContext::startTransaction)
        .compose(context -> context.withResult(collection.insertOne(context.session(), new JsonObject().put("foo", "bar"))))
        .compose(ResultSessionContext::commitTransaction)
        .onSuccess(res -> System.out.println("transaction committed"))
        .map(ResultSessionContext::result)
        .onSuccess(res -> System.out.println("result is:" + res))
        .onFailure(Throwable::printStackTrace);
bfreuden commented 2 years ago

I have started to port the existing MongoClientTestBase into my project: vertx-mongodb-gen/src/test/java/io/vertx/ext/mongo/MongoClientTestBase.java

@vietj @tsegismont I would be interested by your feedback. The source code is almost entirely generated.

https://bfreuden.github.io/vertx-mongodb-gen/index.html

It is definitely closer to the original MongoDB API than it was before.

For the moment I am wondering how I will deal with object ids.

bfreuden commented 2 years ago

Almost all base tests are passing now. Some don't pass with useObjectId=false. I also need to finish the GridFS implementation.

bfreuden commented 2 years ago

The GridFS implementation is done. I've started porting the GridFsTest unit test in vertx-mongodb-gen/src/test/java/io/vertx/ext/mongo/GridFsTest.java.

    String fileName = createTempFileWithContent((1024 * 3) + 70);
    String downloadFileName = createTempFile();
    GridFSBucket bucket = GridFSBucket.create(mongoDatabase, "fs");
    assertNotNull(bucket);
    bucket
      .drop()
      .compose(dropped -> bucket.uploadFile(fileName))
      .compose(res -> bucket.downloadByFilename(fileName).saveToFile(downloadFileName))
      .onSuccess(event -> {
          assertEquals(new File(fileName).length(), new File(downloadFileName).length());
          testComplete();
      })
      .onFailure(this::fail);

At this point I think that, compared to the existing implementation, this implementation lacks the support of useObjectId=false and a JSON-based configuration to create the MongoClient.

But it adds support for multiple databases and transactions.

@vietj @tsegismont Quite a lot of work so I would be very happy to have your feedback guys.

API doc is here: https://bfreuden.github.io/vertx-mongodb-gen/index.html

vietj commented 2 years ago

I will have a look

bfreuden commented 2 years ago

Thank you very much! I've added the possibility to create a MongoClient from a json configuration: This is not fully implemented but here is the idea:

Vertx vertx = Vertx.vertx();
// build a real vertx configuration object (data object)
MongoClientSettings settings = new MongoClientSettings()
        .setClusterSettings(new ClusterSettings()
                .setHosts(Collections.singletonList(new ServerAddress("localhost", 27018))));
// serialize to json for further use
JsonObject jsonSettings = settings.toJson();
System.out.println(jsonSettings);
// build a client from an real configuration object
MongoClient mongoClient = MongoClient.create(vertx, ClientConfig.fromSettings(settings));
// or build a client from a json config
MongoClient mongoClient2 = MongoClient.create(vertx, new ClientConfig(jsonSettings));

The tricky part with the MongoClientSettings settings class is it is a mix of json serializable and non serializable things like callback, listeners, etc... I've done my best to put the json-compatible things in the vertx configuration object. For other things you have:

MongoClient mongoClient = MongoClient.create(vertx, new ClientConfig(jsonSettings)
    // can't be stored in json
    .getPostInitializer().initializeConnectionPoolSettingsWith(
        (vertxContext, builder) -> builder.addConnectionPoolListener(...)
    )
);
bfreuden commented 2 years ago

I've started the implementation of useObjectId=false. It's done in a different way though: there are still object ids in the database but the client shows them as strings in the json:

    mappedMongoClient = MongoClient.create(vertx, getConfig().useObjectIds(false)); // map object ids to strings
    mappedMongoDatabase = mappedMongoClient.getDatabase(getDatabaseName());
    mongoClient = MongoClient.create(vertx, getConfig().useObjectIds(true)); // leave object ids as-is
    mongoDatabase = mongoClient.getDatabase(getDatabaseName());

    MongoCollection<JsonObject> mappedColl = mappedMongoDatabase.getCollection(collection);
    MongoCollection<JsonObject> coll = mongoDatabase.getCollection(collection);
    JsonObject doc = new JsonObject().put("name", "john");
    mappedColl.insertOne(doc, onSuccess(id -> {
      mappedColl.find(doc).first(onSuccess(resultDoc -> {
        Object _id = resultDoc.getValue("_id");
        // object ids are mapped to string
        assertTrue(_id instanceof String);
        coll.find(doc).first(onSuccess(resultDoc2 -> {
          assertNotNull(resultDoc2);
          Object _id2 = resultDoc2.getValue("_id");
          // although they are stored as ObjectIds in the DB
          assertTrue(_id2 instanceof JsonObject);
          assertTrue(_id2.toString().contains(JsonObjectCodec.OID_FIELD));
          // you can search with a string id:
          mappedColl.find(new JsonObject().put("_id", (String)_id)).first(onSuccess(resultDoc3 -> {
            assertNotNull(resultDoc3);
            Object _id3 = resultDoc3.getValue("_id");
            assertEquals(_id, _id3);
            testComplete();
          }));
        }));
      }));
    }));
bfreuden commented 2 years ago

I think the implementation of useObjectId=false is complete: I've resurrected the MongoClientTest test that is running the MongoClientTestBase test with useObjectId=false. There is a behavioural difference with the existing mongo client though.

bfreuden commented 2 years ago

I've added replaceOne methods without filter parameter to the MongoCollection in order to have something close to the save and saveWithOptions that are specific to the existing vert.x mongo client and that don't exist in the MongoDB driver.

vietj commented 2 years ago

@bfreuden can we setup a call to discuss this ?

bfreuden commented 2 years ago

@vietj certainly. I've updated my GH profile to include my email address. Feel free to send me an invite. My availabilities next week are: monday am, wednesday, thursday and friday am.