richardartoul / nola

MIT License
74 stars 6 forks source link

Addressing Staleness and Bias in Sorted Replication Strategy #83

Open aratz-lasa opened 1 year ago

aratz-lasa commented 1 year ago

Description

With the current implementation of the sorted replication strategy, invocations are biased towards one of the servers, which can lead to potential staleness issues with the other replicas over time. This means that NOLA may not be aware of changes or deletions in the non-biased replicas, causing inconsistencies.

Alternative solutions

To ensure data consistency and address this issue, we need to consider alternative solutions. Here are three potential approaches:

  1. Implement a periodic mechanism to ping the non-biased servers: We can introduce a mechanism where the non-biased servers are periodically pinged to check their availability and update the replica status. This would help prevent staleness and ensure that NOLA is aware of any changes or deletions in those replicas.

  2. Update client caches based on last invocation time: Another approach is to modify the client caches to track the last time a server was invoked. By periodically checking the invocation timestamp, we can identify and remove stale servers from the client caches. This would force the clients to fetch the latest server information from the registry, promoting a more dynamic and up-to-date replica selection.

  3. Introduce a new replication strategy: We can explore the possibility of adding a new replication strategy that balances bias and staleness. For example, we could replicate 90% of the time to the biased server while periodically pinging the other servers to ensure their availability and freshness. This would provide a balance between bias and keeping the replicas up to date.

Considering the potential impact on performance, consistency, and overall system behavior, it is important to carefully evaluate and test these alternative solutions before implementation.

This issue aims to initiate a discussion and gather input on addressing staleness and bias in the sorted replication strategy. Any suggestions, insights, or feedback would be highly valuable in finding the most suitable solution.

Please feel free to contribute your thoughts and ideas to resolve this issue effectively.

gedw99 commented 1 year ago

Maybe use nats registry kv. It’s reactive and scales out.

so you have a backplane to see all the instances / actors.

the nats kv is also reactive

other wasm projects are using nats with wasm for this use case too

https://github.com/bots-garden/capsule