Possibility to set readPreference on a query-basis

balintant commented 6 years ago

We can already specify the default readPreference for Moloquent, which is a nice thing to keep it on preferSecondary to lower the load on the primary node of the replicaSet.

In some cases though, you want to read from the primary:

locking a ton of records and then fetching them - in order to ensure your processes never conflict
or because of some reasons, you just need real up-to-date data...

So far, I don't see any option to use preferSecondary by default, and overwrite that for specific queries (only). Please correct me if I'm wrong.

Are we willing to have an easy-to-use solution for that as part of laravel-mongodb?
For those of you who solved this issue, can you please share your experiences?

Ps.: we know that this practice is not really recommended, but so far it seems to be the best option for us.

lab08tonci commented 5 years ago

Any update on this one? @balintant, did you manage to get to a solution?

Another approach could be to set Write Concern to acknowledge when the write operation is done on the replicasets, but that could have a huge performance impact.

balintant commented 5 years ago

@lab08tonci unfortunately I left the project soon after I submitted this issue, and I didn't work with the package since that.

sanketgandhi876 commented 4 years ago

@jenssegers Is there any way we can achieve this?

jenssegers commented 4 years ago

Does Laravel offer a way to switch connections on a model dynamically? How would you see this work from a usage perspective?

sanketgandhi876 commented 4 years ago

@jenssegers It does offer sticky option. Please read this: https://laravel.com/docs/5.8/database#read-and-write-connections My interpretation is when sticky option is enabled then in the same request cycle read query is executed using the same write connection. But when it is off then laravel should use write connection for write queries and read connection for reads.

katjackson commented 1 year ago

@jenssegers I would love to be able to use the aforementioned sticky option with mongodb. Any suggestions?

alcaeus commented 1 year ago

@katjackson sticky doesn't make sense in this package considering how MongoDB handles this problem. The way of scaling through separate read/write connections might be sensible for relational databases, but this is handled transparently in the MongoDB drivers themselves. Allow me to explain:

When you connect to a deployment consisting of more than one server (e.g. a replica set or sharded cluster), the driver discovers the entire cluster and maintains connections to all of them. Then, for each operation the driver will choose a server based on the readPreference option. By default this is primary, but you can override this to another value. However, write operations (such as insertOne or running an aggregation pipeline containing a $merge or $out stage) will always be sent to a primary, regardless of the read preference given. So to achieve the same result as using read/write connections as documented above, in MongoDB you'd connect to the cluster and use a readPreference of secondaryPreferred, which will select a secondary server if available, but will fall back to a primary server if no eligible secondary is found.

Now to the point of the sticky option. This is to deal with replication lag: when using a primary/secondary setup for MySQL, the secondary will always have a certain replication lag. If you write to the primary then try to immediately read the same record from the secondary, you might get stale data (or non-existing data if you just inserted it). There are multiple ways to avoid this in MongoDB:

First of all, using a writeConcern allows you to select the write resilience. For replica set deployments, MongoDB defaults to a write concern of majority, which means that the write operation will not complete until the majority of the servers in the deployment (e.g. 2 servers in a 3-member replica set) have acknowledged the write. This already lowers the chances of reading from a stale secondary quite a bit.
Since the sticky option is only valid during the current request, chances are you already have the data you're looking for (e.g. when inserting new data). When updating data, the use of findOneAndUpdate operations or findOneAndReplace will allow you to directly fetch the data from the primary and choose whether you want the previous or new version of the document. I don't believe this is exposed directly in the library but will have to be accessed by accessing the underlying MongoDB\Collection object. This method has the added benefit of combining two operations (a write and a read) into one, greatly reducing load on the cluster.
For more complex operations, reads in MongoDB also support the readConcern option. I'll direct you to the readConcern documentation for details.

To summarise, I believe allowing users to specify readConcern, readPreference, and writeConcern per operation would be sensible, and I believe it should be a fairly easy task. This would allow you to explicitly select a primary read preference for certain reads when it's important that you get the latest data.

So, why not sticky? MongoDB offers much better tooling to direct your reads away from the primary server. Also, just because a single write happened doesn't mean that every read during the same request should only ever read from the primary. While it may be true for some requests, it certainly isn't true for all of them, and I believe choosing the read preference accordingly is a better tool.

Last but not least, allow me to reiterate that while it is possible to scale read loads across servers using replica sets is possible, that's not their main purpose. Replica sets are for operational resilience, meaning that if your primary goes down, there are other servers to take over quickly without manual intervention. Write concerns are a way of choosing what level of operational resilience you want (e.g. using a majority write concern gives you a near 100% chance that the data you wrote won't be rolled back). For smaller applications it may be feasible to scale using secondary reads, but be aware of the downfalls such as replication lag and avoid them accordingly as suggested above.

katjackson commented 1 year ago

@alcaeus thanks so much for your detailed explanation! I I agree that being able to set readConcern, readPreference, and writeConcern per operation would be the best solution, and indeed it is why this ticket was started.

I understand that reading from the secondary is not their intended purpose, but we are attempting to identify specific processes in our system where this would be safe, in order to improve performance. I though that the sticky option might be a way to offload a little bit of the load without having to assess every process individually.

Since I cannot easily specify those settings per operation, my current plan is to set up a second connection to MongoDB where readPreference is secondaryPrefered. Then I can set that "read only" connection on the model before performing any queries that we know are safe, even if they fall victim to the <1% occurrence of replication lag.

GromNaN commented 4 months ago

Is your issue solved @katjackson?

katjackson commented 4 months ago

@GromNaN sort of? but it didn't work.

the initial request of this ticket was to be able to set the readPreference on a per-operation basis. to achieve this, i set up 2 connections, one with primaryPreferred and one with secondaryPreferred. then, i added a method to a base model to switch the connection,

    public function setReadOnlyConnection()
    {
        $this->connection = 'mongodb_read_only';
        return $this;
    }

and used it for read operations that could withstand a few seconds lag.

        $client = (new Client())
            ->setReadOnlyConnection()
            ->where('key', $clientKey)
            ->project(['name' => 1])
            ->first();

But... the replication lag turned out to be much greater than expected (even though the writeConcern was always set to majority). On a day where we experienced significant performance issues on our MongoDB cluster, the secondary was over 45 min behind for most of the day. And, in the end, we decided that no queries are really safe to read from the secondary. Not at our scale anyway.

GromNaN commented 4 months ago

Thanks for the feedback. I don't see what we can do in this package. Closing then.

mongodb / laravel-mongodb

Possibility to set readPreference on a query-basis #1555