pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.48k stars 816 forks source link

[RFC] Deprecate/Stop TorchText releases starting with Pytorch release 2.4 #2250

Open atalman opened 3 months ago

atalman commented 3 months ago

🚀 Deprecation of TorchText releases

As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering.

We would like to do the following:

For reference here is the PyTorch Release schedule: https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence

cc @seemethere @malfet @matthewdzmura @NicolasHug

agunapal commented 3 months ago

Do we recommend any alternatives? Ex: TorchServe has a text_classifier handler and tests associated with these ( uses TorchText) https://github.com/pytorch/serve/blob/master/ts/torch_handler/text_classifier.py

So, wondering whats the strategy. Should we replace it with HuggingFace and PyTorch would come up another solution at a later date?

agunapal commented 3 months ago

Can we release TorchText in PyTorch 2.3 for all platforms (ex: aarch64, not sure what other platform has this missing for PyTorch 2.2) ?

atalman commented 3 months ago

Yes. we will release same set of binaries as for PyTorch 2.2: https://hud2.pytorch.org/hud/pytorch/text/release%2F0.18/1?per_page=50

NicolasHug commented 3 months ago

Do we recommend any alternatives?

This would be case-by-case. For the TorchServe example the simple alternative is to copy/paste the one functionality that was used from torchtext into the example. It's very short and simple, so that's a viable solution.

https://github.com/pytorch/text/blob/main/torchtext/data/utils.py#L207-L228

agunapal commented 3 months ago

Do we recommend any alternatives?

This would be case-by-case. For the TorchServe example the simple alternative is to copy/paste the one functionality that was used from torchtext into the example. It's very short and simple, so that's a viable solution.

https://github.com/pytorch/text/blob/main/torchtext/data/utils.py#L207-L228

Thanks. This seems like a good idea. We also use from torchtext.data.utils import get_tokenizer . Looking at the code, it doesn't seem too complicated to copy paste it for basic_english

atalman commented 3 months ago

cc @matthewdzmura @seemethere : releng team and @malfet propose to stop releasing TorchText as of release 2.3 since we can't ensure the quality of the release.

HadiSDev commented 3 months ago

What would be the alternative if I need a preprocessing for bert / vocab / regex operations compiled with my model?

ffquintella commented 2 months ago

keras.io is a viable alternative ...

gluefox commented 2 months ago

Is there any alternative for c++-only environments, that need native tokenizers now?