Closed shrinath-suresh closed 2 years ago
cc: @VitalyFedyunin, @ejguan
Can we move prepare_data (and other utilities to create DataLoader) outside of model class?
Also I think the use of to_map_style_dataset should be discouraged/deprecated, we probably should work with DataPipes directly to showcase usage of datasets built on top of Datapipes with DataLoader. Here is a tutorial that does training directly using Datapipes.
Just a note: we do have native converter from iter to map. https://github.com/pytorch/data/blob/c2223fdaab4630da9be4cfbff32441b766f475bd/torchdata/datapipes/iter/util/converter.py#L21
For the Error, it seems weird as well as why torch.save
would touch the object from dataset.
For the Error, it seems weird as well as why
torch.save
would touch the object from dataset.
seems like the it is due to prepare_data is part of Model class https://github.com/mlflow/mlflow-torchserve/blob/0e93a3e41b5a4ba1b8cf501a95f681cfb6cea24d/examples/BertNewsClassification/news_classifier.py#L151
Just a note: we do have native converter from iter to map. https://github.com/pytorch/data/blob/c2223fdaab4630da9be4cfbff32441b766f475bd/torchdata/datapipes/iter/util/converter.py#L21
That's sounds good! I think, we should probably get rid of to_map_style dataset function in favor of native support in torchdata :)
Can we move prepare_data (and other utilities to create DataLoader) outside of model class?
Also I think the use of to_map_style_dataset should be discouraged/deprecated, we probably should work with DataPipes directly to showcase usage of datasets built on top of Datapipes with DataLoader. Here is a tutorial that does training directly using Datapipes.
sure. i will retest by moving the prepare data out of model class.
Just a note: we do have native converter from iter to map. https://github.com/pytorch/data/blob/c2223fdaab4630da9be4cfbff32441b766f475bd/torchdata/datapipes/iter/util/converter.py#L21
That's sounds good! I think, we should probably get rid of to_map_style dataset function in favor of native support in torchdata :)
And if the PyTorch example can be updated, it would really help. Because, i followed the steps from the tutorial - https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html#split-the-dataset-and-run-the-model .
@parmeet @ejguan Thanks for your input. I moved the prepare data out of model class and the issue is resolved. code link here
I moved the prepare data out of model class and the issue is resolved.
Sounds good @shrinath-suresh, let me close this issue for now then.
🐛 Bug
Describe the bug
Can't pickle local object 'to_map_style_dataset.<locals>._MapStyleDataset'
error is produced while saving the entire model withtorch.save(model, "model.pth")
To Reproduce Steps to reproduce the behavior:
pip install news_classifier.py --max_epochs 1 --num_samples 100
mlflow.pytorch
library to save the model. mlflow.pytorch internally saves the entire model usingtorch.save
. The following exception is thrownAttributeError: Can't pickle local object 'to_map_style_dataset.<locals>._MapStyleDataset'
Looks like the
to_map_style_dataset
is not serializable.Expected behavior Model should be saved without any error
Screenshots If applicable, add screenshots to help explain your problem.
Environment
Additional context Add any other context about the problem here.
Sample Logs - https://gist.github.com/shrinath-suresh/00086bf503690dd0365e39846ef2dbb5