Open stas00 opened 4 days ago
Great suggestion, thanks! Added support for this in 7ba873fc87086098657e488e7365f8c14aeb4d06
You can now override the BaseFilter's
def filter_batch(self, batch: List[Document]) -> List[bool | Tuple[bool, str]]:
method and pass batch_size
to the BaseFilter's __init__
This is a very cool library! Kudos to the authors!
The Filter API seems to be only working with a single item at a time.
Is there a way to filter in batches? Say you're using a filter that uses an ml model inference. It'd be much more efficient to infer large batches, than 1 item at a time.
I looked around the examples and code in case I have missed it, but I don't seem to find any suggestions that batched input is supported.
The API I think would be similar to the HF Tokenizer where it takes batches and returns batches, so here instead of returning a bool, it'd return a list of bools. If the input is a single sample, return a single bool - if a list, return a list.
Thanks a lot!