umarbutler / semchunk

A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
MIT License
154 stars 9 forks source link

Memoization destorys `semchunk.chunk()`'s signature #2

Closed umarbutler closed 6 months ago

umarbutler commented 6 months ago

Memoization is destorying semchunk.chunk()'s signature as described in this issue. A fix is to run:

def chunk(text: str, chunk_size: int, token_counter: callable, memoize: bool=True, _recursion_depth: int = 0) -> list[str]:
    ...

chunk = functools.wraps(chunk)(functools.cache(chunk))

Instead of:

@functools.cache
def chunk(text: str, chunk_size: int, token_counter: callable, memoize: bool=True, _recursion_depth: int = 0) -> list[str]:
    ...
umarbutler commented 6 months ago

This bug has been fixed in v0.2.3.