pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.12k stars 149 forks source link

UserWarning: Lambda function is not supported by pickle, please use regular python function or functools.partial instead #953

Closed austinmw closed 1 year ago

austinmw commented 1 year ago

🐛 Describe the bug

When I run:

from torchdata.datapipes.iter import HttpReader

URL = "https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/train.csv"
ag_news_train = HttpReader([URL]).parse_csv().map(lambda t: (int(t[0]), " ".join(t[1:])))
agn_batches = ag_news_train.batch(2).map(lambda batch: {'labels': [sample[0] for sample in batch],\
                                      'text': [sample[1].split() for sample in batch]})

batch = next(iter(agn_batches))
assert batch['text'][0][0:8] == ['Wall', 'St.', 'Bears', 'Claw', 'Back', 'Into', 'the', 'Black']

I get the following:

UserWarning: Lambda function is not supported by pickle, please use regular python function or functools.partial instead.

Versions

Python 3.8.0 torch 2.0.0.dev20230119+cu116 torchdata # 0.6.0.dev20230119

ejguan commented 1 year ago

Please try to reduce the usage of lambda function in the pipeline, which is unpicklable -> can't do multiprocessing.

You can replace your lambda functions with

def map_fn1(t):
    return (int(t[0]), " ".join(t[1:]))

def map_fn2(batch):
    ...
austinmw commented 1 year ago

@ejguan But I am literally copying the "sanity check" example directly from TorchData's GitHub homepage..

I guess that is not an up-to-date/recommended way to use this library?

ejguan commented 1 year ago

Fair point that we should improve the part of sanity check. cc: @NivekT since you are working on README right now, we might remove the sanity check part and ask users to refer to examples/online doc.

For reference, we have a folder of examples in https://github.com/pytorch/data/tree/main/examples Our online doc has amount of examples as well https://pytorch.org/data/main/

austinmw commented 1 year ago

Thanks, I will refer to those examples!

NivekT commented 1 year ago

Added the fix to #954

ejguan commented 1 year ago

Closing as the sanity check has been removed from README