[REF-2503] Memory StateManager should have some sort of expiration

picklelo commented 6 months ago

The state manager is essentially a dictionary mapping between a user token (browser tab) and their state.

When using Redis as a state manager (by setting the redis_url in the rxconfig.py) there is a configurable expiration time (by default 30 mins). However when using the in memory state manager, states will accumulate until eventually there is an OOM error.

We should implement some sort of expiration or LRU caching on the memory state manager to make this usable in production.

_REF-2503

AmanSal1 commented 6 months ago

@picklelo So, I've been considering giving it a try. What I've figured out so far is that we need to modify the get and set methods, possibly adding additional methods to the StateManagerMemory class, and then write test cases for it. In my opinion, implementing LRU caching would be a more complex approach. If the app works fine, expiration-based caching would be a great approach, as we only have to maintain the time period associated with each state and accordingly expire and remove it from memory. With your guidance, I think I can accomplish this

picklelo commented 6 months ago

@AmanSal1 thanks for looking into this! Agree with you that expiration-based caching would be ideal, but I'm unsure how to implement that easily, do you have a way?

AmanSal1 commented 6 months ago

@picklelo What I could think of that we'll enhance the StateManagerMemory class by introducing a dictionary to store expiration times for each state. Upon accessing a state, we'll check its expiration time and remove it if it's expired. We'll update the set_state method to include the expiration time calculation based on a predefined expiration period. Periodically, we'll clean up expired states using a method that iterates through the expiration times dictionary and removes expired states. This approach ensures efficient memory usage by automatically removing expired states, thereby preventing memory overflow and optimizing performance. Additionally, we'll ensure thread safety and consider efficiency in implementing expiration logic to handle concurrent accesses and minimize computational overhead. Though at implementation level it would be bit difficult .

picklelo commented 6 months ago

Periodically, we'll clean up expired states using a method that iterates through the expiration times dictionary and removes expired states.

This is the part that seems a bit tricky to me, there would have to be some worker thread that keeps checking and deleting this. I wonder if for a first pass the LRU cache can be a good simple approach that achieves most of the goal, and we can expand to the expiration method in a follow up. You mentioned LRU may be more complicated - why do you think so? I feel it should be easier than this method.

AmanSal1 commented 6 months ago

@picklelo So like why I said that LRU caching is bit difficult because of the difference in eviction strategies contributes to the perception that implementing LRU caching may be more complex than expiration-based caching. In expiration-based caching, the eviction process is automatic and deterministic, driven solely by the expiration times of items. In contrast, in LRU caching, you need to actively manage the access order of items and make decisions about which items to evict based on access history. Additionally we also should consider the use cases of our app like in both the methods have their own trade offs.

picklelo commented 6 months ago

It looks like there are available packages for both ways:

LRU Dict: https://pypi.org/project/lru-dict/ Expiring Dict: https://pypi.org/project/expiringdict/

So maybe we can go with one of these, either using the library or reimplementing some of the logic within reflex. But based on this, expiring dict makes the most sense as it will match our Redis API, and we can have a parameter for the expiration time.

AmanSal1 commented 6 months ago

@picklelo Yes, I also researched this and found the LRU cache package from functools . But when you said that LRU caching could be simple, I think in some sense maybe you are correct. Because what I've gathered is that we just need a decorator over the get_state method, and everything else is handled by the LRU caching library. Maybe at the implementation level, it may differ, but yes, it could be simple. Can you tell me how we can test these StateManagerMemory instances? What is the right way to create their instance and test them?

picklelo commented 6 months ago

@AmanSal1 We have a test on the Redis expiration, we can extend something similar for the Memory one. See here

AmanSal1 commented 6 months ago

@picklelo Is this the right way to declare StatManagerMemory instance .


import pytest
from reflex.state import StateManagerMemory

@pytest.mark.asyncio
async def test_state_manager_memory_caching():
    """Test the LRU caching mechanism in StateManagerMemory."""

    # Create an instance of StateManagerMemory
    state_manager = StateManagerMemory()

    assert state_manager.get_state.cache_info().hits == 0
    assert state_manager.get_state.cache_info().misses == 0

    token = "test_token"
    for _ in range(5):
        state = await state_manager.get_state(token)

    # Ensure that all calls hit the cache after the first call
    assert state_manager.get_state.cache_info().hits == 4
    assert state_manager.get_state.cache_info().misses == 1

    # Call get_state with a different token
    for _ in range(5):
        state = await state_manager.get_state("another_token")

    # Ensure that new calls with a different token also hit the cache
    assert state_manager.get_state.cache_info().hits == 8
    assert state_manager.get_state.cache_info().misses == 2

picklelo commented 6 months ago

@AmanSal1 you need to pass in a State class to it (this is what it will return an instance of during a cache miss)

class ExampleState(rx.State):
   var1: int
   var2: int

state_manager = StateManagerMemory(state=ExampleState)

The other thing to check would be that when it receives it from a cache hit it retains the old state values, while when there's a cache miss it will get a fresh state.

AmanSal1 commented 6 months ago

@picklelo So like I wanted to know what is the Can't instantiate abstract class StateManagerMemory with abstract method get_state error because if I run pytest test_state.py I get multiple errors I am adding a screenshot of this ?

picklelo commented 6 months ago

@AmanSal1 what Python version are you on? This is without modifying it at all? The tests should pass as we run CI for all python versions.

AmanSal1 commented 6 months ago

@picklelo Python version - 3.10.0 . I think that it could be of like integration testing thing like we can't run single module for testing as it could be dependent on other modules .

AmanSal1 commented 6 months ago

@picklelo When I make a change in the test_state.py file and then revert it back and rerun pytest, the test fails. As seen in the photo above, something seems to be initiated or perhaps cached, causing the test to fail repeatedly.

ElijahAhianyo commented 6 months ago

@picklelo So like I wanted to know what is the Can't instantiate abstract class StateManagerMemory with abstract method get_state error because if I run pytest test_state.py I get multiple errors I am adding a screenshot of this ?

@AmanSal1 The tests should run as is without any modifications, The redis tests should also be skipped. Are you running the tests with redis?

AmanSal1 commented 6 months ago

@ElijahAhianyo Yes, I understand your point. I'm attempting to integrate LRU caching into the built-in state manager. However, when I write the test case in test_state.py, it fails. Even if I revert my changes, the test continues to fail. Initially, if we don't change anything, it exhibits the same behavior you mentioned

ElijahAhianyo commented 6 months ago

@ElijahAhianyo Yes, I understand your point. I'm attempting to integrate LRU caching into the built-in state manager. However, when I write the test case in test_state.py, it fails. Even if I revert my changes, the test continues to fail. Initially, if we don't change anything, it exhibits the same behavior you mentioned

I see, what's the code diff you're adding? Will be really helpful if there's a linked PR I can try to repro

AmanSal1 commented 6 months ago

@ElijahAhianyo Sure . Will be adding a PR shortly .

AmanSal1 commented 6 months ago

@ElijahAhianyo I have made a PR .

ElijahAhianyo commented 6 months ago

@ElijahAhianyo I have made a PR .

It looks like the issue youre facing is pydantic related. There's a weird behavior in v1 (not an issue in v2)where it looks like the meta data of the function decorated with lru_cache is not copied properly and works if the abstract method is also decorated. Eg:

from pydantic import BaseModel
from ABC import ABC, abstractmethod
from functools import lru_cache

class AbstractClass(BaseModel, ABC):

    @abstractmethod
    def abstract_method(self, token):
        pass

    class Config:
        """Pydantic config."""
        arbitrary_types_allowed = True

class ChildClass(AbstractClass):

    @lru_cache(maxsize=100)
    def abstract_method(self, token):
        pass

child_class = ChildClass()

This above will throw an error. But below works:

...
class AbstractClass(BaseModel, ABC):
   @lru_cache(maxsize=100)
    @abstractmethod
    def abstract_method(self, token):
        pass

    class Config:
        """Pydantic config."""
        arbitrary_types_allowed = True

class ChildClass(AbstractClass):

    @lru_cache(maxsize=100)
    def abstract_method(self, token):
        pass

child_class = ChildClass()

For the weird meta data behavior, here's a sample:

# pydantic v1
...
class AbstractClass(BaseModel, ABC):

    @lru_cache(maxsize=100)
    @abstractmethod
    def abstract_method(self, token):
        pass

    class Config:
        """Pydantic config."""
        arbitrary_types_allowed = True

class ChildClass(AbstractClass):

    @lru_cache(maxsize=100)
    def abstract_method(self, token):
        pass

child_class = ChildClass()
print(child_class.abstract_method) # prints <functools._lru_cache_wrapper object at 0x12ad52820>

In pydantic V2 however

class AbstractClass(BaseModel, ABC):

    @lru_cache(maxsize=100)
    @abstractmethod
    def abstract_method(self, token):
        pass

class ChildClass(AbstractClass):

    @lru_cache(maxsize=100)
    def abstract_method(self, token):
        pass

child_class = ChildClass()
print(child_class.abstract_method) # prints <bound method ChildClass.abstract_method of ChildClass()>

If your approach of using the lru_cache decorator works, there's an ongoing project to move to pydantic v2 and I'm wondering if need to do this after that is done. @masenf do you have any thoughts on this approach?

AmanSal1 commented 6 months ago

@ElijahAhianyo I tried your solution, and yes, the initialization error is removed, but now there is another error. Maybe I am not writing the correct test case for the LRU caching ERROR Message FAILED test_state.py::test_lru_caching_memory - TypeError: StateManagerMemory.get_state() missing 1 required positional argument: 'token'

reflex-dev / reflex

[REF-2503] Memory StateManager should have some sort of expiration #3021