pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.51k stars 811 forks source link

Compare torchnlp to torchtext #331

Open PetrochukM opened 6 years ago

PetrochukM commented 6 years ago

Hi There!

To help folks sort through the clutter of libraries, please add a comparison of PyTorch-NLP and pytorch/text.

https://github.com/PetrochukM/PyTorch-NLP

Thanks!

jekbradbury commented 6 years ago

Would you rather PyTorch-NLP replace torchtext as the semi-official NLP support library for PyTorch? Or are there things you believe you still lack relative to torchtext?

PetrochukM commented 6 years ago

I think I got everything, missing a couple datasets. The codebase has a completely different design.

Happy to help however I can!

PetrochukM commented 6 years ago

@jekbradbury Checking in!

PetrochukM commented 6 years ago

Again!

zhangguanheng66 commented 5 years ago

@PetrochukM just to check in and see which kind of datasets you would propose.

eedeleon commented 5 years ago

Are there plans to consolidate these two libraries?

Also are there any roadmaps available for releases/plans? I noticed torch/vision has a models/ directory and torch text does not. After a searching for a while it seems like NLP models are scattered across many github repositories with little to no support minus a hand full of research libraries such as AllenNLP.

torchtext and torchnlp seem to be trying to solve very similar problems according to my limited understanding

Are there suggestions for where to go for:

  1. models (hopefully verified to reproduce the published results and some level of community/repository owner maintenance)
  2. idioms around PyTorch + NLP
  3. access to open source datasets with a clear license for companies to evaluate legal exposure
  4. to contribute to the above list of useful code/data.
zhangguanheng66 commented 5 years ago

@eedeleon Yes. There will be a release by the end of July (0.4.0). We are now planning to incorporate some common NLP models in torchtext (like torchvision) to support the research community. We also try to improve the merge process, including more completed docs, examples, and unit tests. If you have any models and dataset in mind, feel free to propose here or slack pytorch text.

PetrochukM commented 5 years ago

@eedeleon Originally, there was a plan to deprecate torchtext and merge torchnlp into the official library; however, the Facebook PyTorch team has had a lack of resources and expertise in NLP to follow through on this process.

Also, as per Facebooks request, I did not have the time to dedicate a full-time effort to torchnlp.

There was also a plan from pytext and torchnlp to collaborate because pytext was finding they had to rewrite much of the code in torchnlp. That said, there was a lack of follow up.


@zhangguanheng66 I am glad that you are anticipating a major release for torchtext by the end of July. However, the lack of progress on torchtext historically gives me pause. Furthermore, there has been a lack of progress in the last month since you made your comment.

Furthermore, the lack of docs, examples, and unit tests is apparent.


That said, I'll continue to support and improve PyTorch-NLP. Our engineering team uses it for our research supported by the Allen NLP team. We will continue to make the improvements that help our use case.

We would love for more people to actively contribute to PyTorch-NLP inline with their needs!

zhangguanheng66 commented 5 years ago

@PetrochukM Thanks for the comments. For the next release, I will try to add a few new supervised learning dataset, a tutorial to construct dataset with new pattern. We still have a lot of tech debt to fix but we have to prioritize them.

00krishna commented 2 years ago

Say @PetrochukM I was wondering if there were any updates on this post? I have been looking at Torchtext lately, and the API has been completely revamped. Lots of major breaking change and even a different design concept as they try to be more consistent with pytorch Dataloaders, etc. BUT I am finding the package really hard to use because of a lack of documentation for the new API. And there are a lot of things like the Dataloaders need to be recreated after each epoch, because of StopIteration errors, etc. Are the two projects still mutually exclusive or are they replacements for each other, hybrid?