Open douglas-raillard-arm opened 7 months ago
I believe there used to be a counter, and was used with toggle_string_cache()
back when that existed, and that was further deprecated since toggling off wouldn't actually toggle, but decrement the counter.
I agree that for context managers a counter should definitely be used here.
I'm not sure if this is in the scope of polars to fix. It seems from a couple google searches that it might be a limitation on decorating recursive functions. A couple workarounds would be to do
@pl.StringCache()
def f(x):
def _f(x):
if x == 0:
return 'end'
else:
return _f(x-1)
return _f(x)
or
def f(x):
if x == 0:
return 'end'
else:
return f(x-1)
@pl.StringCache()
def f2(x):
return f(x)
Here's an interesting SO answer that inspired the first workaround.
@stinodego should this be a polars bug or close?
@deanm0000
It seems from a couple google searches that it might be a limitation on decorating recursive functions
There is no particular problem decorating recursive functions in general. That specific implementation of the decorator has that issue. It's like saying functions in Python are expected to be buggy because you can write a buggy function. On top of that, it's not even the decorator that has an issue, it's the context manager as demonstrated in the original report:
cm = pl.StringCache()
with cm:
with cm:
pass
Now you could argue that re-entrance is not to be expected on a context manager, but again, a trivial fix solves both cases.
Also the doc states:
The class can also be used as a function decorator, in which case the string cache is enabled during function execution, and disabled afterwards.
This is not true currently in the general case. The issue could be "fixed" by stating it won't work on recursive functions but actually fixing the code is even easier:
class StringCache:
def __init__(self):
self._string_cache = []
def __enter__(self):
self._string_cache.append(PyStringCacheHolder())
return self
def __del__(self, *args):
self._string_cache.pop()
That version (with a stack instead of a counter) will still behave properly if PyStringCacheHolder
becomes smarter and aware of its own nesting (right now it looks like a single Rust global variable, but in case this changed that Python code would just follow the Rust behavior naturally, whatever it does)
Also if enabling a re-entrant context manager is not desired (as it would set that API decision in stone (not that it matters since you could always make it work regardless of the implementation but well ...), you can also just fix the decorator:
import functools
class StringCache:
...
def __call__(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
with self.__class__():
return f(*args, **kwargs)
return wrapper
That decorator will work just fine on recursive functions too, since it creates a new StringCache context manager for every call, rather than creating a single one once and for all that is attached to the function itself. Similarly to fixing the cm, it can be made to use a counter and a single top-level instance if necessary.
A simpler solution would probably be to only give the first StringCache
control over the globalization. All other string cache contexts inside are no-ops.
Checks
Reproducible example
Log output
Issue description
StringCache
is documented as being usable as a decorator. However, it fails to handle recursive functions properly.The sources of
StringCache
show the issue: https://github.com/pola-rs/polars/blob/4bc67a0d0f6c9a113fd6b231d0d9638e58407156/py-polars/polars/string_cache.py#L66Here is a simplified version:
When entering the context manager multiple times, such as in this example:
The
self._string_cache
attribute is overwritten multiple times, but thenself._string_cache
is deleted by__exit__
in the inner layer and therefore raises in the outer layers. This could be fixed by either keeping a count of each time__enter__
is entered, or by using a stack (list) of caches inself._string_cache
, and only popping the last level in__exit__
Expected behavior
It should not raise any exception
Installed versions