pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k stars 152 forks source link

[DataPipe] DataPipe Deprecation Tracker #163

Open NivekT opened 2 years ago

NivekT commented 2 years ago

We have a number of DataPipes that are being deprecated. Our general policy is that we first mark the DataPipe as deprecated with a warning, and wait at least one release cycle (~3 months) before removing it. Note that some DataPipes will be removed from the PyTorch Core library but will remain in TorchData, and some others are renamed.

Status Types:

DataLoader2 Tracker

Name Deprecation Date Status Earliest Removal Version
PrototypeMultiProcessingReadingService -> MultiProcessingReadingService 0.6 Deprecated 0.8

IterDataPipe Tracker

Name Functional API Module Deprecation Date Status Earliest Removal Version
BucketBatcher NA Core Sep 30th, 2021 Removed (moved to TorchData)
HTTPReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
LineReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
TarArchiveReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
ZipArchiveReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
FileLoader NA Core Jan 5th, 2022 Removed (use FileOpener) 1.13 (Sept 2022)
FileLoader NA Data Jan 5th, 2022 Removed (use FileOpener)
IoPathFileLoader load_file_by_iopath Data Jan 5th, 2022 Removed (use IoPathFileOpener)
RoutedDecoder routed_decode Core Jan 10th, 2022 Deprecated 1.13 (Sept 2022)
TarArchiveReader read_from_tar Data Feb 22th, 2022 Removed (use TarArchiveLoader) 0.5 (Sept 2022)
XzFileReader read_from_xz Data Feb 22th, 2022 Removed (use XzFileLoader) 0.5 (Sept 2022)
ZipArchiveReader read_from_zip Data Feb 22th, 2022 Removed (use ZipArchiveLoader) 0.5 (Sept 2022)
Filter filter Core 1.12 Removed argument (drop_empty_batches) 2.0 (Nov 2022)
FSSpecFileOpener open_files_by_fsspec Data 0.4 open_file_by_fsspec is Removed 0.6 (Nov 2022)
IoPathFileOpener open_files_by_fsspec Data 0.4 open_file_by_iopath is Removed 0.6 (Nov 2022)

MapDataPipe Tracker

Nothing for now

cc: @ejguan @VitalyFedyunin @NivekT

ejguan commented 2 years ago

For TarArchiveReader, should we add a deprecation warning in main branch as 0.3.0 branch cut has been finished.

ejguan commented 2 years ago
Another Misc tracker: Name Module Deprecation Version Status Earliest Removal Version
torch.utils.data.graph.traverse Core 1.13 Deprecating 1.15 / 2.1
BlueskyFR commented 1 year ago

I see RoutedDecoder has been marked as deprecated: what is it going to be replaced by?

ejguan commented 1 year ago

I see RoutedDecoder has been marked as deprecated: what is it going to be replaced by?

@BlueskyFR IIRC, we plan to remove this DataPipe in the future. The general reason is that we think this can be easily achieved by using a demux based on file types then decode each datapipe correspondingly then mux them together. Glad to hear your use case.

BlueskyFR commented 1 year ago

I see RoutedDecoder has been marked as deprecated: what is it going to be replaced by?

@BlueskyFR IIRC, we plan to remove this DataPipe in the future. The general reason is that we think this can be easily achieved by using a demux based on file types then decode each datapipe correspondingly then mux them together. Glad to hear your use case.

I don't understand: how should I proceed to decode a PNG image in the current state then?

ejguan commented 1 year ago

You can use a map function like datapipe.map(decode_fn) to decode the PNG image

BlueskyFR commented 1 year ago

You can use a map function like datapipe.map(decode_fn) to decode the PNG image

Okay, but why was support for decoding dropped then?

ejguan commented 1 year ago

Okay, but why was support for decoding dropped then?

decoding didn't do more things like a map function, except we provided a few decoding functions for convenient. And, in order to support routed_decode, we need to add lots of decoding functions to cover the general file decoding, which is not sustainable for us to maintain and it makes the routed_decode more complicated and redundant. For example of your use case (decoding PNG), the routed_decode would add more decoding handlers such as json, pickle, etc. into this DataPipe.

As, TorchData provides composable way to construct pipeline, users should be able to create a pipeline to handle specific decoding mechanism

BlueskyFR commented 1 year ago

Okay, but why was support for decoding dropped then?

decoding didn't do more things like a map function, except we provided a few decoding functions for convenient. And, in order to support routed_decode, we need to add lots of decoding functions to cover the general file decoding, which is not sustainable for us to maintain and it makes the routed_decode more complicated and redundant. For example of your use case (decoding PNG), the routed_decode would add more decoding handlers such as json, pickle, etc. into this DataPipe.

As, TorchData provides composable way to construct pipeline, users should be able to create a pipeline to handle specific decoding mechanism

Okay. What is the preferred mechanism to decode images? Ideally I think it should be done in batches if performance is needed

ejguan commented 1 year ago

Okay. What is the preferred mechanism to decode images? Ideally I think it should be done in batches if performance is needed

It depends on if your decode_fn supports batched decoding in high performance (multithreading). Otherwise, I think it's going to be similar to do decoding per image.