Closed dberenbaum closed 3 months ago
Do we need support for this and
batch_map()
There is no urgent need in this since we support setup(). Let's exclude these from public API for now.
PS: It suppose to work like this:
# lambda cannot be a generator, so, use function
def func(file) -> Iterator[int]:
for f in file:
yield len(f.parent + f.name)
(DataChain.from_storage(path="gs://dvcx-datalakes/dogs-and-cats/")
.settings(batch=10)
.map(
path_len=func,
).show())
Description
batch
andbatch_map
do not work and it's not clear what the syntax should be.Take this example:
It fails with error
TypeError: DatasetQuery.add_signals() got an unexpected keyword argument 'batch'
. Do we want to be passing the batch size to the non-batchedmap()
method, or should we always assume batch size is 1 here? Ifbatch
> 1, should the udf expect batched inputs and outputs?Do we need support for this and
batch_map()
? It's not clear what each one should do.Note that
batch_map()
now looks like it's just a copy ofgen()
and fails with the same error.