Closed balintant closed 4 months ago
Any update on this one? @balintant, did you manage to get to a solution?
Another approach could be to set Write Concern to acknowledge when the write operation is done on the replicasets, but that could have a huge performance impact.
@lab08tonci unfortunately I left the project soon after I submitted this issue, and I didn't work with the package since that.
@jenssegers Is there any way we can achieve this?
Does Laravel offer a way to switch connections on a model dynamically? How would you see this work from a usage perspective?
@jenssegers It does offer sticky
option. Please read this: https://laravel.com/docs/5.8/database#read-and-write-connections
My interpretation is when sticky option is enabled then in the same request cycle read query is executed using the same write connection. But when it is off then laravel should use write connection for write queries and read connection for reads.
@jenssegers I would love to be able to use the aforementioned sticky
option with mongodb. Any suggestions?
@katjackson sticky
doesn't make sense in this package considering how MongoDB handles this problem. The way of scaling through separate read/write connections might be sensible for relational databases, but this is handled transparently in the MongoDB drivers themselves. Allow me to explain:
When you connect to a deployment consisting of more than one server (e.g. a replica set or sharded cluster), the driver discovers the entire cluster and maintains connections to all of them. Then, for each operation the driver will choose a server based on the readPreference
option. By default this is primary
, but you can override this to another value. However, write operations (such as insertOne
or running an aggregation pipeline containing a $merge
or $out
stage) will always be sent to a primary, regardless of the read preference given. So to achieve the same result as using read/write connections as documented above, in MongoDB you'd connect to the cluster and use a readPreference of secondaryPreferred
, which will select a secondary server if available, but will fall back to a primary server if no eligible secondary is found.
Now to the point of the sticky
option. This is to deal with replication lag: when using a primary/secondary setup for MySQL, the secondary will always have a certain replication lag. If you write to the primary then try to immediately read the same record from the secondary, you might get stale data (or non-existing data if you just inserted it). There are multiple ways to avoid this in MongoDB:
writeConcern
allows you to select the write resilience. For replica set deployments, MongoDB defaults to a write concern of majority
, which means that the write operation will not complete until the majority of the servers in the deployment (e.g. 2 servers in a 3-member replica set) have acknowledged the write. This already lowers the chances of reading from a stale secondary quite a bit.sticky
option is only valid during the current request, chances are you already have the data you're looking for (e.g. when inserting new data). When updating data, the use of findOneAndUpdate
operations or findOneAndReplace
will allow you to directly fetch the data from the primary and choose whether you want the previous or new version of the document. I don't believe this is exposed directly in the library but will have to be accessed by accessing the underlying MongoDB\Collection
object. This method has the added benefit of combining two operations (a write and a read) into one, greatly reducing load on the cluster.readConcern
option. I'll direct you to the readConcern
documentation for details.To summarise, I believe allowing users to specify readConcern
, readPreference
, and writeConcern
per operation would be sensible, and I believe it should be a fairly easy task. This would allow you to explicitly select a primary read preference for certain reads when it's important that you get the latest data.
So, why not sticky
? MongoDB offers much better tooling to direct your reads away from the primary server. Also, just because a single write happened doesn't mean that every read during the same request should only ever read from the primary. While it may be true for some requests, it certainly isn't true for all of them, and I believe choosing the read preference accordingly is a better tool.
Last but not least, allow me to reiterate that while it is possible to scale read loads across servers using replica sets is possible, that's not their main purpose. Replica sets are for operational resilience, meaning that if your primary goes down, there are other servers to take over quickly without manual intervention. Write concerns are a way of choosing what level of operational resilience you want (e.g. using a majority
write concern gives you a near 100% chance that the data you wrote won't be rolled back). For smaller applications it may be feasible to scale using secondary reads, but be aware of the downfalls such as replication lag and avoid them accordingly as suggested above.
@alcaeus thanks so much for your detailed explanation! I I agree that being able to set readConcern
, readPreference
, and writeConcern
per operation would be the best solution, and indeed it is why this ticket was started.
I understand that reading from the secondary is not their intended purpose, but we are attempting to identify specific processes in our system where this would be safe, in order to improve performance. I though that the sticky
option might be a way to offload a little bit of the load without having to assess every process individually.
Since I cannot easily specify those settings per operation, my current plan is to set up a second connection to MongoDB where readPreference
is secondaryPrefered
. Then I can set that "read only" connection on the model before performing any queries that we know are safe, even if they fall victim to the <1% occurrence of replication lag.
Is your issue solved @katjackson?
@GromNaN sort of? but it didn't work.
the initial request of this ticket was to be able to set the readPreference on a per-operation basis. to achieve this, i set up 2 connections, one with primaryPreferred
and one with secondaryPreferred
. then, i added a method to a base model to switch the connection,
public function setReadOnlyConnection()
{
$this->connection = 'mongodb_read_only';
return $this;
}
and used it for read operations that could withstand a few seconds lag.
$client = (new Client())
->setReadOnlyConnection()
->where('key', $clientKey)
->project(['name' => 1])
->first();
But... the replication lag turned out to be much greater than expected (even though the writeConcern
was always set to majority
). On a day where we experienced significant performance issues on our MongoDB cluster, the secondary was over 45 min behind for most of the day. And, in the end, we decided that no queries are really safe to read from the secondary. Not at our scale anyway.
Thanks for the feedback. I don't see what we can do in this package. Closing then.
We can already specify the default
readPreference
for Moloquent, which is a nice thing to keep it onpreferSecondary
to lower the load on the primary node of the replicaSet.In some cases though, you want to read from the primary:
So far, I don't see any option to use
preferSecondary
by default, and overwrite that for specific queries (only). Please correct me if I'm wrong.laravel-mongodb
?Ps.: we know that this practice is not really recommended, but so far it seems to be the best option for us.