hissssst / nebulex_adapters_ecto

Postgres Ecto Adapter for Nebulex Cache
BSD 2-Clause "Simplified" License
6 stars 2 forks source link

Usage of monotonic_time for now() #3

Closed PragTob closed 3 months ago

PragTob commented 4 months ago

Hi there,

first thanks for creating this project!

I'm not sure if there's something I'm missing but from what I can see monotonic_time is used to calculate now/0 which is then used in touched_at and a bunch of other calculations involving the ttl etc.

  defp now do
    :erlang.monotonic_time(:millisecond)
  end

The problem with this is this notice on the [monotonic_time docs]((https://www.erlang.org/doc/apps/erts/erlang.html#monotonic_time/0):

Different runtime system instances will use different unspecified points in time as base for their Erlang monotonic clocks. That is, it is pointless comparing monotonic times from different runtime system instances. Different runtime system instances can also place this unspecified point in time different relative runtime system start.

(emphasis mine)

What this means to me is, this would be fine if the cache had at maximum one client and it wasn't restarted. But 2 different systems, or a system after restart, may have completely different ideas of what the time is in monotonic time as it's not used to refer to any specific time but just "time passed since the system started" (roughly speaking).

There's more about montonic time in the erlang docs.

So, unless I'm mistaken I'm pretty sure this is a bug in at least multi-node systems that all want to use the same cache.

A quick fix would probably be System.os_time(:millilsecond) but clocks may still differ, so maybe relying on the clock of the database/Postgres more may be helpful.

hissssst commented 3 months ago

I don't agree with this. Monotonic time can change, so as any other time can change. However, when nodes are in the cluster, runtime will try to sync the monotonic time, by adjusting how it progresses.

If we use database timestamp, restart of the database or late NTP sync may also create hops in time in database, so there's no "correct" solution or "non-dangerous". Feel free to fork the library and use the timestamp you find more applicable for your use-case

PragTob commented 3 months ago

@hissssst the point is that monotonic time isn't guaranteed the same/similar between nodes but even not on the same node after a restart. Yes "normal" time can jump via NTP sync - but that's minor corrections, but here values are compared that have no relation (both for different nodes and after a restart). We're talking about values being wildly different as monotonic time has no real reference to a "real" time.

The literal erlang docs are saying that it's pointless to compare monotonic time between different runtime instances. It's ok if you have your opinion, but when that opinion disagrees with the erlang docs I think it'd be worth re examining that opinion.

hissssst commented 3 months ago

The literal erlang docs are saying that it's pointless to compare monotonic time between different runtime instances

Yes, I agree that this time is not monotonic across the cluster. This library is not designed to work in the cluster. It is merely a persistence level for a single node. If you use the same table for persistence of the cached data in the cluster of N nodes, you'll have a GC which runs N times more often than it is supposed to. You can overcome this by naming cache under {:global, _}, but it is not what I intended to achieve with this library

So, monotonic_time fits the intended use-case 100%

hissssst commented 3 months ago

If you want to use this library in cluster, I suggest you removing the GC startup and setup an Oban or something similar (maybe roll out your own cluster singleton implementation) to perform the GC on a single node with the correct time comparison

hissssst commented 3 months ago

Oh, I think I know how to use it with cluster. I'd suggest inclusive multilevel cache, where first level is distributed cache and the last level is ecto adapter with unique table.

PragTob commented 3 months ago

@hissssst it's not just usage in a cluster. When you restart a node because you deploy a new version (or blue green deploy a new version on a new node) monotonic time will also likely be different and hence all the values saved in the database will be wrong from the point of view of the node checking. So values will either be kept way longer than they should (and with that I mean days or weeks) or all be declared invalid instantly. I don't think "the cache doesn't work any more as soon as I deploy a new version" is a valid tradeoff for a cache in a persistent storage.

hissssst commented 3 months ago

monotonic time will also likely be different

I agree

days or weeks

I don't agree. Always less than a minute in my experience

hissssst commented 3 months ago

Feel free to override timestamps in 1.0.1, thanks for the issue reporting